bw-video-len_33-step_20-num-1.log

Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=1, ring_degree=1)
2026-02-02 14:09:46.064 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
2026-02-02 14:09:46.065 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:09:46.741 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:10:02.963 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:10:05.461 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:10:05.633 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:02<00:07,  2.49s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:05<00:05,  2.71s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:08<00:02,  2.77s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:08<00:00,  1.79s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:08<00:00,  2.12s/it]
2026-02-02 14:10:19.819 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:10:23.769 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:10:24.283 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:10:24.447 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:10:24.500 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:10:24.595 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:10:24.617 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

  0%|          | 0/2 [00:00<?, ?it/s]
 50%|█████     | 1/2 [00:12<00:12, 12.57s/it]
100%|██████████| 2/2 [00:20<00:00,  9.91s/it]
100%|██████████| 2/2 [00:20<00:00, 10.30s/it]
2026-02-02 14:11:05.154 | INFO     | hyvideo.inference:predict:671 - Success, time: 40.5368127822876
2026-02-02 14:11:05.154 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:11:05.180 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0

  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:08<02:35,  8.17s/it]
 10%|█         | 2/20 [00:16<02:25,  8.10s/it]
 15%|█▌        | 3/20 [00:24<02:18,  8.13s/it]
 20%|██        | 4/20 [00:32<02:10,  8.14s/it]
 25%|██▌       | 5/20 [00:40<02:02,  8.14s/it]
 30%|███       | 6/20 [00:48<01:54,  8.15s/it]
 35%|███▌      | 7/20 [00:57<01:45,  8.15s/it]
 40%|████      | 8/20 [01:05<01:37,  8.16s/it]
 45%|████▌     | 9/20 [01:13<01:29,  8.16s/it]
 50%|█████     | 10/20 [01:21<01:21,  8.16s/it]
 55%|█████▌    | 11/20 [01:29<01:13,  8.16s/it]
 60%|██████    | 12/20 [01:37<01:05,  8.16s/it]
 65%|██████▌   | 13/20 [01:46<00:57,  8.17s/it]
 70%|███████   | 14/20 [01:54<00:49,  8.17s/it]
 75%|███████▌  | 15/20 [02:02<00:40,  8.17s/it]
 80%|████████  | 16/20 [02:10<00:32,  8.17s/it]
 85%|████████▌ | 17/20 [02:18<00:24,  8.16s/it]
 90%|█████████ | 18/20 [02:26<00:16,  8.16s/it]
 95%|█████████▌| 19/20 [02:35<00:08,  8.17s/it]
100%|██████████| 20/20 [02:43<00:00,  8.16s/it]
100%|██████████| 20/20 [02:43<00:00,  8.16s/it]
2026-02-02 14:14:02.787 | INFO     | hyvideo.inference:predict:671 - Success, time: 177.60623216629028
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2026-02-02 14:14:03.989 | INFO     | __main__:main:72 - Sample save to: ./results/2026-02-02-14:14:02_seed42_A cat walks on the grass, realistic style..mp4