W0202 14:14:12.618000 10287 lib/python3.10/dist-packages/torch/distributed/run.py:793] 
W0202 14:14:12.618000 10287 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
W0202 14:14:12.618000 10287 lib/python3.10/dist-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0202 14:14:12.618000 10287 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=2, ring_degree=1)
2026-02-02 14:14:17.479 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:14:17.495100 10321 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:14:17.495119 10321 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:14:17 [parallel_state.py:200] world_size=2 rank=0 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:14:17.495599 10321 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x560712355bc0, SPLIT_COLOR: 22836467197190088, PG Name: 1
I0202 14:14:17.495608 10321 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=2, ring_degree=1)
2026-02-02 14:14:17.823 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:14:17.837303 10322 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:14:17.837325 10322 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:14:17 [parallel_state.py:200] world_size=2 rank=1 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:14:17.837966 10322 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x563b7a34bac0, SPLIT_COLOR: 22836467197190088, PG Name: 1
I0202 14:14:17.837976 10322 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.844038 10322 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 5
I0202 14:14:17.844046 10322 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.844133 10321 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 3
I0202 14:14:17.844147 10321 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.844980 10322 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 9 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 9
I0202 14:14:17.844990 10322 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 9 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.845337 10321 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 7 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 7
I0202 14:14:17.845350 10321 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 7 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.846011 10322 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 13 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 13
I0202 14:14:17.846024 10322 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 13 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.846212 10321 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 11 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 11
I0202 14:14:17.846223 10321 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 11 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847054 10322 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 16 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 16
I0202 14:14:17.847065 10322 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 16 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847277 10321 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 15 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 15
I0202 14:14:17.847280 10322 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x563b7a34bac0, SPLIT_COLOR: 22836467197190088, PG Name: 17
I0202 14:14:17.847290 10321 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 15 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847291 10322 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847499 10322 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 19 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 19
I0202 14:14:17.847507 10322 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 19 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847539 10321 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x560712355bc0, SPLIT_COLOR: 22836467197190088, PG Name: 17
I0202 14:14:17.847550 10321 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847719 10322 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x563b7a34bac0, SPLIT_COLOR: 22836467197190088, PG Name: 20
I0202 14:14:17.847729 10322 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.847836 10321 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 18 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 18
I0202 14:14:17.847846 10321 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 18 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.848086 10321 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x560712355bc0, SPLIT_COLOR: 22836467197190088, PG Name: 20
I0202 14:14:17.848096 10321 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.849359 10322 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 24 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 24
I0202 14:14:17.849367 10322 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 24 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.849380 10321 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 22 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 22
I0202 14:14:17.849390 10321 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 22 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.850406 10321 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 26 Rank 0] ProcessGroupNCCL initialization options: size: 2, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x560712355bc0, SPLIT_COLOR: 22836467197190088, PG Name: 26
I0202 14:14:17.850417 10321 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 26 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:14:17.850459 10322 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 26 Rank 1] ProcessGroupNCCL initialization options: size: 2, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x563b7a34bac0, SPLIT_COLOR: 22836467197190088, PG Name: 26
I0202 14:14:17.850469 10322 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 26 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:14:17.850 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:14:17.850 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:14:18.469 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:14:18.566 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:14:32.421 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:14:34.427 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:14:34.558 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]2026-02-02 14:14:36.374 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:10,  3.66s/it]2026-02-02 14:14:38.396 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:14:38.543 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.61s/it]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:10,  3.53s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.43s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.21s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.70s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:06<00:06,  3.48s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.72s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.38s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.83s/it]
2026-02-02 14:14:51.676 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:14:54.140 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:14:54.547 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:14:54.743 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:14:54.777 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:14:54.863 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:14:54.890 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:14:56.410 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:14:58.939 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:14:59.363 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:14:59.572 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:14:59.614 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:14:59.706 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:14:59.732 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:15:02.029738 10321 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.104829 ms
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:15:06.722991 10322 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.27926 ms
I0202 14:15:07.066458 10322 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 17 Rank 1] ProcessGroupNCCL created ncclComm_ 0x563bc687fb60 on CUDA device: 
I0202 14:15:07.066471 10322 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 17 Rank 1] NCCL_DEBUG: N/A
I0202 14:15:07.066511 10321 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 17 Rank 0] ProcessGroupNCCL created ncclComm_ 0x56076060dd50 on CUDA device:  
I0202 14:15:07.066557 10321 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 17 Rank 0] NCCL_DEBUG: N/A
I0202 14:15:11.860304 10321 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.109209 ms
I0202 14:15:11.860379 10322 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 42.9653 ms
I0202 14:15:12.129735 10322 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 20 Rank 1] ProcessGroupNCCL created ncclComm_ 0x563bc6b12a90 on CUDA device: 
I0202 14:15:12.129738 10321 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 20 Rank 0] ProcessGroupNCCL created ncclComm_ 0x5607608a24a0 on CUDA device:  
I0202 14:15:12.129750 10322 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 20 Rank 1] NCCL_DEBUG: N/A
I0202 14:15:12.129751 10321 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 20 Rank 0] NCCL_DEBUG: N/A
 50%|█████     | 1/2 [00:12<00:12, 12.08s/it] 50%|█████     | 1/2 [00:07<00:07,  7.44s/it]100%|██████████| 2/2 [00:11<00:00,  5.67s/it]100%|██████████| 2/2 [00:11<00:00,  5.94s/it]
100%|██████████| 2/2 [00:16<00:00,  7.61s/it]100%|██████████| 2/2 [00:16<00:00,  8.28s/it]
2026-02-02 14:15:31.479 | INFO     | hyvideo.inference:predict:671 - Success, time: 31.746337175369263
2026-02-02 14:15:31.479 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:15:31.497 | INFO     | hyvideo.inference:predict:671 - Success, time: 36.6064395904541
2026-02-02 14:15:31.497 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:15:31.511 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:15:31.529 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
  0%|          | 0/20 [00:00<?, ?it/s]  0%|          | 0/20 [00:00<?, ?it/s]  5%|▌         | 1/20 [00:04<01:25,  4.49s/it]  5%|▌         | 1/20 [00:04<01:25,  4.51s/it] 10%|█         | 2/20 [00:08<01:20,  4.47s/it] 10%|█         | 2/20 [00:08<01:20,  4.49s/it] 15%|█▌        | 3/20 [00:13<01:16,  4.49s/it] 15%|█▌        | 3/20 [00:13<01:16,  4.50s/it] 20%|██        | 4/20 [00:17<01:11,  4.50s/it] 20%|██        | 4/20 [00:18<01:12,  4.50s/it] 25%|██▌       | 5/20 [00:22<01:07,  4.50s/it] 25%|██▌       | 5/20 [00:22<01:07,  4.51s/it] 30%|███       | 6/20 [00:27<01:03,  4.51s/it] 30%|███       | 6/20 [00:27<01:03,  4.51s/it] 35%|███▌      | 7/20 [00:31<00:58,  4.51s/it] 35%|███▌      | 7/20 [00:31<00:58,  4.51s/it] 40%|████      | 8/20 [00:36<00:54,  4.51s/it] 40%|████      | 8/20 [00:36<00:54,  4.51s/it] 45%|████▌     | 9/20 [00:40<00:49,  4.51s/it] 45%|████▌     | 9/20 [00:40<00:49,  4.51s/it] 50%|█████     | 10/20 [00:45<00:45,  4.51s/it] 50%|█████     | 10/20 [00:45<00:45,  4.51s/it] 55%|█████▌    | 11/20 [00:49<00:40,  4.51s/it] 55%|█████▌    | 11/20 [00:49<00:40,  4.51s/it] 60%|██████    | 12/20 [00:54<00:36,  4.51s/it] 60%|██████    | 12/20 [00:54<00:36,  4.51s/it] 65%|██████▌   | 13/20 [00:58<00:31,  4.51s/it] 65%|██████▌   | 13/20 [00:58<00:31,  4.51s/it] 70%|███████   | 14/20 [01:03<00:27,  4.51s/it] 70%|███████   | 14/20 [01:03<00:27,  4.51s/it] 75%|███████▌  | 15/20 [01:07<00:22,  4.53s/it] 75%|███████▌  | 15/20 [01:07<00:22,  4.53s/it] 80%|████████  | 16/20 [01:12<00:18,  4.52s/it] 80%|████████  | 16/20 [01:12<00:18,  4.52s/it] 85%|████████▌ | 17/20 [01:16<00:13,  4.52s/it] 85%|████████▌ | 17/20 [01:16<00:13,  4.52s/it] 90%|█████████ | 18/20 [01:21<00:09,  4.51s/it] 90%|█████████ | 18/20 [01:21<00:09,  4.51s/it] 95%|█████████▌| 19/20 [01:25<00:04,  4.51s/it] 95%|█████████▌| 19/20 [01:25<00:04,  4.51s/it]100%|██████████| 20/20 [01:30<00:00,  4.51s/it]100%|██████████| 20/20 [01:30<00:00,  4.51s/it]
100%|██████████| 20/20 [01:30<00:00,  4.51s/it]100%|██████████| 20/20 [01:30<00:00,  4.51s/it]
2026-02-02 14:17:16.145 | INFO     | hyvideo.inference:predict:671 - Success, time: 104.63348984718323
2026-02-02 14:17:16.170 | INFO     | hyvideo.inference:predict:671 - Success, time: 104.64064598083496
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
I0202 14:17:16.836709 10322 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL destructor entered.
I0202 14:17:16.836776 10322 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 1] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:17:16.836982 10322 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 1] future is successfully executed for: ProcessGroup abort
I0202 14:17:16.836990 10322 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL aborts successfully.
I0202 14:17:16.837023 10322 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL watchdog thread joined.
I0202 14:17:16.837232 10322 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL heart beat monitor thread joined.
2026-02-02 14:17:17.372 | INFO     | __main__:main:72 - Sample save to: ./results/2026-02-02-14:17:16_seed42_A cat walks on the grass, realistic style..mp4
I0202 14:17:18.061129 10321 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL destructor entered.
W0202 14:17:18.061209 10321 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
I0202 14:17:18.061225 10321 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 0] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:17:18.061466 10321 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 0] future is successfully executed for: ProcessGroup abort
I0202 14:17:18.061475 10321 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL aborts successfully.
I0202 14:17:18.061491 10321 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL watchdog thread joined.
I0202 14:17:18.061618 10321 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL heart beat monitor thread joined.