fv_251.log 8.66 KB
Newer Older
hepj's avatar
hepj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:33 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 17:12:34 __init__.py:193] Automatically detected platform rocm.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
--> loading model from /home/model/HunyuanVideo/hunyuan-video-t2v-720p
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Could not load Sliding Tile Attention.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
  Total training parameters = 12821.012544 M
--> Initializing FSDP with sharding strategy: full
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
--> model loaded
--> applying fdsp activation checkpointing...
FullyShardedDataParallel(
  (_fsdp_wrapped_module): HYVideoDiffusionTransformer(
    (img_in): PatchEmbed(
      (proj): Conv3d(16, 3072, kernel_size=(1, 2, 2), stride=(1, 2, 2))
      (norm): Identity()
    )
    (txt_in): SingleTokenRefiner(
      (input_embedder): Linear(in_features=4096, out_features=3072, bias=True)
      (t_embedder): TimestepEmbedder(
        (mlp): Sequential(
          (0): Linear(in_features=256, out_features=3072, bias=True)
          (1): SiLU()
          (2): Linear(in_features=3072, out_features=3072, bias=True)
        )
      )
      (c_embedder): TextProjection(
        (linear_1): Linear(in_features=4096, out_features=3072, bias=True)
        (act_1): SiLU()
        (linear_2): Linear(in_features=3072, out_features=3072, bias=True)
      )
      (individual_token_refiner): IndividualTokenRefiner(
        (blocks): ModuleList(
          (0-1): 2 x IndividualTokenRefinerBlock(
            (norm1): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
            (self_attn_qkv): Linear(in_features=3072, out_features=9216, bias=True)
            (self_attn_q_norm): Identity()
            (self_attn_k_norm): Identity()
            (self_attn_proj): Linear(in_features=3072, out_features=3072, bias=True)
            (norm2): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
            (mlp): MLP(
              (fc1): Linear(in_features=3072, out_features=12288, bias=True)
              (act): SiLU()
              (drop1): Dropout(p=0.0, inplace=False)
              (norm): Identity()
              (fc2): Linear(in_features=12288, out_features=3072, bias=True)
              (drop2): Dropout(p=0.0, inplace=False)
            )
            (adaLN_modulation): Sequential(
              (0): SiLU()
              (1): Linear(in_features=3072, out_features=6144, bias=True)
            )
          )
        )
      )
    )
    (time_in): TimestepEmbedder(
      (mlp): Sequential(
        (0): Linear(in_features=256, out_features=3072, bias=True)
        (1): SiLU()
        (2): Linear(in_features=3072, out_features=3072, bias=True)
      )
    )
    (vector_in): MLPEmbedder(
      (in_layer): Linear(in_features=768, out_features=3072, bias=True)
      (silu): SiLU()
      (out_layer): Linear(in_features=3072, out_features=3072, bias=True)
    )
    (guidance_in): TimestepEmbedder(
      (mlp): Sequential(
        (0): Linear(in_features=256, out_features=3072, bias=True)
        (1): SiLU()
        (2): Linear(in_features=3072, out_features=3072, bias=True)
      )
    )
    (double_blocks): ModuleList(
      (0-19): 20 x FullyShardedDataParallel(
        (_fsdp_wrapped_module): CheckpointWrapper(
          (_checkpoint_wrapped_module): MMDoubleStreamBlock(
            (img_mod): ModulateDiT(
              (act): SiLU()
              (linear): Linear(in_features=3072, out_features=18432, bias=True)
            )
            (img_norm1): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (img_attn_qkv): Linear(in_features=3072, out_features=9216, bias=True)
            (img_attn_q_norm): RMSNorm()
            (img_attn_k_norm): RMSNorm()
            (img_attn_proj): Linear(in_features=3072, out_features=3072, bias=True)
            (img_norm2): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (img_mlp): MLP(
              (fc1): Linear(in_features=3072, out_features=12288, bias=True)
              (act): GELU(approximate='tanh')
              (drop1): Dropout(p=0.0, inplace=False)
              (norm): Identity()
              (fc2): Linear(in_features=12288, out_features=3072, bias=True)
              (drop2): Dropout(p=0.0, inplace=False)
            )
            (txt_mod): ModulateDiT(
              (act): SiLU()
              (linear): Linear(in_features=3072, out_features=18432, bias=True)
            )
            (txt_norm1): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (txt_attn_qkv): Linear(in_features=3072, out_features=9216, bias=True)
            (txt_attn_q_norm): RMSNorm()
            (txt_attn_k_norm): RMSNorm()
            (txt_attn_proj): Linear(in_features=3072, out_features=3072, bias=True)
            (txt_norm2): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (txt_mlp): MLP(
              (fc1): Linear(in_features=3072, out_features=12288, bias=True)
              (act): GELU(approximate='tanh')
              (drop1): Dropout(p=0.0, inplace=False)
              (norm): Identity()
              (fc2): Linear(in_features=12288, out_features=3072, bias=True)
              (drop2): Dropout(p=0.0, inplace=False)
            )
          )
        )
      )
    )
    (single_blocks): ModuleList(
      (0-39): 40 x FullyShardedDataParallel(
        (_fsdp_wrapped_module): CheckpointWrapper(
          (_checkpoint_wrapped_module): MMSingleStreamBlock(
            (linear1): Linear(in_features=3072, out_features=21504, bias=True)
            (linear2): Linear(in_features=15360, out_features=3072, bias=True)
            (q_norm): RMSNorm()
            (k_norm): RMSNorm()
            (pre_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (mlp_act): GELU(approximate='tanh')
            (modulation): ModulateDiT(
              (act): SiLU()
              (linear): Linear(in_features=3072, out_features=9216, bias=True)
            )
          )
        )
      )
    )
    (final_layer): FinalLayer(
      (norm_final): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
      (linear): Linear(in_features=3072, out_features=64, bias=True)
      (adaLN_modulation): Sequential(
        (0): SiLU()
        (1): Linear(in_features=3072, out_features=6144, bias=True)
      )
    )
  )
)
optimizer: AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 1e-05
    maximize: False
    weight_decay: 0.01
)
***** Running training *****
  Num examples = 101
  Dataloader size = 13
  Num Epochs = 1
  Resume training from step 0
  Instantaneous batch size per device = 1
  Total train batch size (w. data & sequence parallel, accumulation) = 2.0
  Gradient Accumulation steps = 1
  Total optimization steps = 12
  Total training parameters per FSDP shard = 1.602626568 B
  Master weight dtype: torch.float32
--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
zll step_time: 135.40s avg_step_time: 135.4012050628662
zll step_time: 122.44s avg_step_time: 128.91861820220947
zll step_time: 122.18s avg_step_time: 126.67362546920776
zll step_time: 122.14s avg_step_time: 125.5411741733551