bw-video-len_129-step_20-num-2.log

W0202 14:28:00.488000 23500 lib/python3.10/dist-packages/torch/distributed/run.py:793] 
W0202 14:28:00.488000 23500 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
W0202 14:28:00.488000 23500 lib/python3.10/dist-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0202 14:28:00.488000 23500 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=129, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=2, ring_degree=1)
2026-02-02 14:28:07.474 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:28:07.495322 23534 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:28:07.495340 23534 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:28:07 [parallel_state.py:200] world_size=2 rank=0 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:28:07.495790 23534 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55c1ecaf0560, SPLIT_COLOR: 22836467197190088, PG Name: 1
I0202 14:28:07.495801 23534 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=129, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=2, ring_degree=1)
2026-02-02 14:28:08.488 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:28:08.502862 23535 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:28:08.502882 23535 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:28:08 [parallel_state.py:200] world_size=2 rank=1 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:28:08.503347 23535 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x559d8ecec4e0, SPLIT_COLOR: 22836467197190088, PG Name: 1
I0202 14:28:08.503357 23535 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.509886 23535 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 5
I0202 14:28:08.509897 23535 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.510100 23534 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 3
I0202 14:28:08.510125 23534 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.510986 23535 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 9 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 9
I0202 14:28:08.510998 23535 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 9 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.511489 23534 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 7 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 7
I0202 14:28:08.511502 23534 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 7 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.511899 23535 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 13 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 13
I0202 14:28:08.511910 23535 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 13 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.512621 23534 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 11 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 11
I0202 14:28:08.512631 23534 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 11 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.512837 23535 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 16 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 16
I0202 14:28:08.512848 23535 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 16 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.513136 23535 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x559d8ecec4e0, SPLIT_COLOR: 22836467197190088, PG Name: 17
I0202 14:28:08.513146 23535 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.513356 23535 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 19 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 19
I0202 14:28:08.513365 23535 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 19 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.513576 23535 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x559d8ecec4e0, SPLIT_COLOR: 22836467197190088, PG Name: 20
I0202 14:28:08.513585 23535 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.513830 23534 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 15 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 15
I0202 14:28:08.513841 23534 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 15 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.514124 23534 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55c1ecaf0560, SPLIT_COLOR: 22836467197190088, PG Name: 17
I0202 14:28:08.514134 23534 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.514326 23534 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 18 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 18
I0202 14:28:08.514336 23534 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 18 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.514765 23534 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55c1ecaf0560, SPLIT_COLOR: 22836467197190088, PG Name: 20
I0202 14:28:08.514775 23534 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.515851 23534 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 22 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 22
I0202 14:28:08.515861 23534 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 22 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.516067 23535 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 24 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 24
I0202 14:28:08.516077 23535 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 24 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.516883 23534 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 26 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55c1ecaf0560, SPLIT_COLOR: 22836467197190088, PG Name: 26
I0202 14:28:08.516893 23534 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 26 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:28:08.516945 23535 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 26 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x559d8ecec4e0, SPLIT_COLOR: 22836467197190088, PG Name: 26
I0202 14:28:08.516955 23535 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 26 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:28:08.517 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:28:08.517 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:28:09.078 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:28:09.127 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:28:22.406 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:28:24.418 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:28:24.559 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]2026-02-02 14:28:25.690 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:28:27.788 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:28:27.907 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:10,  3.62s/it]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:11,  3.83s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.63s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.42s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.20s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.69s/it]

Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.90s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:12<00:04,  4.26s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  2.80s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  3.26s/it]
2026-02-02 14:28:42.720 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:28:45.077 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:28:45.542 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:28:45.739 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:28:45.788 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:28:45.877 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:28:45.970 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:28:48.156 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:28:50.372 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

  0%|          | 0/2 [00:00<?, ?it/s]2026-02-02 14:28:50.830 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:28:51.050 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:28:51.097 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:28:51.188 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:28:51.283 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:28:58.518110 23534 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.07409 ms
I0202 14:28:58.518260 23535 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 5917.35 ms
I0202 14:28:58.861362 23534 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL created ncclComm_ 0x55c25776a7b0 on CUDA device: 
I0202 14:28:58.861373 23535 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL created ncclComm_ 0x559deae863a0 on CUDA device: 
I0202 14:28:58.861377 23534 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 17 Rank 0] NCCL_DEBUG: N/A
I0202 14:28:58.861385 23535 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 17 Rank 1] NCCL_DEBUG: N/A
I0202 14:29:37.835897 23534 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.07931 ms
I0202 14:29:37.835973 23535 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 1.57874 ms
I0202 14:29:38.102166 23535 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL created ncclComm_ 0x559deb117030 on CUDA device: 
I0202 14:29:38.102180 23535 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 20 Rank 1] NCCL_DEBUG: N/A
I0202 14:29:38.102178 23534 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL created ncclComm_ 0x55c2579faa00 on CUDA device: 
I0202 14:29:38.102191 23534 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 20 Rank 0] NCCL_DEBUG: N/A

 50%|█████     | 1/2 [00:47<00:47, 47.46s/it]
 50%|█████     | 1/2 [00:41<00:41, 41.64s/it]
100%|██████████| 2/2 [01:26<00:00, 42.35s/it]
100%|██████████| 2/2 [01:26<00:00, 43.12s/it]

100%|██████████| 2/2 [01:20<00:00, 39.96s/it]
100%|██████████| 2/2 [01:20<00:00, 40.21s/it]
2026-02-02 14:31:45.889 | INFO     | hyvideo.inference:predict:671 - Success, time: 179.9184126853943
2026-02-02 14:31:45.890 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:31:45.981 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:31:46.168 | INFO     | hyvideo.inference:predict:671 - Success, time: 174.8853166103363
2026-02-02 14:31:46.169 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)

  0%|          | 0/20 [00:00<?, ?it/s]2026-02-02 14:31:46.287 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0

  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:39<12:24, 39.17s/it]
  5%|▌         | 1/20 [00:38<12:18, 38.85s/it]
 10%|█         | 2/20 [01:18<11:41, 38.97s/it]
 10%|█         | 2/20 [01:17<11:39, 38.84s/it]
 15%|█▌        | 3/20 [01:56<11:01, 38.92s/it]
 15%|█▌        | 3/20 [01:56<11:00, 38.85s/it]
 20%|██        | 4/20 [02:35<10:22, 38.90s/it]
 20%|██        | 4/20 [02:35<10:21, 38.86s/it]
 25%|██▌       | 5/20 [03:14<09:43, 38.89s/it]
 25%|██▌       | 5/20 [03:14<09:42, 38.86s/it]
 30%|███       | 6/20 [03:53<09:04, 38.87s/it]
 30%|███       | 6/20 [03:53<09:03, 38.86s/it]
 35%|███▌      | 7/20 [04:32<08:25, 38.87s/it]
 35%|███▌      | 7/20 [04:31<08:25, 38.85s/it]
 40%|████      | 8/20 [05:11<07:46, 38.87s/it]
 40%|████      | 8/20 [05:10<07:46, 38.86s/it]
 45%|████▌     | 9/20 [05:49<07:07, 38.85s/it]
 45%|████▌     | 9/20 [05:49<07:07, 38.85s/it]
 50%|█████     | 10/20 [06:29<06:29, 38.93s/it]
 50%|█████     | 10/20 [06:28<06:29, 38.93s/it]
 55%|█████▌    | 11/20 [07:07<05:50, 38.98s/it]
 55%|█████▌    | 11/20 [07:08<05:50, 38.98s/it]
 60%|██████    | 12/20 [07:46<05:12, 39.02s/it]
 60%|██████    | 12/20 [07:47<05:12, 39.02s/it]
 65%|██████▌   | 13/20 [08:26<04:33, 39.03s/it]
 65%|██████▌   | 13/20 [08:26<04:33, 39.03s/it]
 70%|███████   | 14/20 [09:05<03:54, 39.04s/it]
 70%|███████   | 14/20 [09:05<03:54, 39.04s/it]
 75%|███████▌  | 15/20 [09:44<03:15, 39.05s/it]
 75%|███████▌  | 15/20 [09:44<03:15, 39.05s/it]
 80%|████████  | 16/20 [10:23<02:36, 39.06s/it]
 80%|████████  | 16/20 [10:23<02:36, 39.06s/it]
 85%|████████▌ | 17/20 [11:02<01:57, 39.07s/it]
 85%|████████▌ | 17/20 [11:02<01:57, 39.07s/it]
 90%|█████████ | 18/20 [11:41<01:18, 39.08s/it]
 90%|█████████ | 18/20 [11:41<01:18, 39.08s/it]
 95%|█████████▌| 19/20 [12:20<00:39, 39.08s/it]
 95%|█████████▌| 19/20 [12:20<00:39, 39.08s/it]
100%|██████████| 20/20 [12:59<00:00, 39.08s/it]
100%|██████████| 20/20 [12:59<00:00, 39.08s/it]
100%|██████████| 20/20 [12:59<00:00, 38.98s/it]

100%|██████████| 20/20 [12:59<00:00, 39.00s/it]
2026-02-02 14:46:14.423 | INFO     | hyvideo.inference:predict:671 - Success, time: 868.1352031230927
2026-02-02 14:46:14.460 | INFO     | hyvideo.inference:predict:671 - Success, time: 868.4789748191833
I0202 14:46:15.279754 23535 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL destructor entered.
I0202 14:46:15.280123 23535 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 1] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:46:15.280300 23535 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 1] future is successfully executed for: ProcessGroup abort
I0202 14:46:15.280306 23535 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL aborts successfully.
I0202 14:46:15.280313 23535 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL watchdog thread joined.
I0202 14:46:15.280411 23535 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL heart beat monitor thread joined.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2026-02-02 14:46:17.155 | INFO     | __main__:main:72 - Sample save to: ./results/2026-02-02-14:46:14_seed42_A cat walks on the grass, realistic style..mp4
I0202 14:46:18.403568 23534 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL destructor entered.
W0202 14:46:18.403652 23534 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
I0202 14:46:18.403669 23534 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 0] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:46:18.403894 23534 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 0] future is successfully executed for: ProcessGroup abort
I0202 14:46:18.403903 23534 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL aborts successfully.
I0202 14:46:18.403931 23534 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL watchdog thread joined.
I0202 14:46:18.404124 23534 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL heart beat monitor thread joined.