nohup.out 49.7 KB
Newer Older
hepj's avatar
hepj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
Run training...
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From train_gpu_test.py:492: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From train_gpu_test.py:492: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From train_gpu_test.py:482: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W0623 14:46:20.521119 47187556010368 module_wrapper.py:139] From train_gpu_test.py:482: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:n_token 204
I0623 14:46:20.521361 47187556010368 train_gpu_test.py:482] n_token 204
INFO:tensorflow:[train] File names ['train.bsz-12.tlen-512.tfrecords']
I0623 14:46:20.531896 47187556010368 data_utils.py:434] [train] File names ['train.bsz-12.tlen-512.tfrecords']
INFO:tensorflow:num of batches 14483
I0623 14:46:20.532083 47187556010368 train_gpu_test.py:240] num of batches 14483
WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.sparse_tensor_to_dense is deprecated. Please use tf.sparse.to_dense instead.

W0623 14:46:34.696085 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.sparse_tensor_to_dense is deprecated. Please use tf.sparse.to_dense instead.

WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W0623 14:46:34.697554 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

W0623 14:46:34.698294 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W0623 14:46:34.701079 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/data_utils.py:506: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0623 14:46:35.660339 47187556010368 deprecation.py:323] From /work/home/hepj/tf1/transformer-xl-master/tf/data_utils.py:506: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From train_gpu_test.py:247: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
W0623 14:46:35.673759 47187556010368 deprecation.py:323] From train_gpu_test.py:247: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
WARNING:tensorflow:From train_gpu_test.py:259: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0623 14:46:35.692192 47187556010368 module_wrapper.py:139] From train_gpu_test.py:259: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From train_gpu_test.py:259: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

W0623 14:46:35.692516 47187556010368 module_wrapper.py:139] From train_gpu_test.py:259: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

WARNING:tensorflow:From train_gpu_test.py:263: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0623 14:46:35.692863 47187556010368 module_wrapper.py:139] From train_gpu_test.py:263: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/gpu_utils.py:6: The name tf.NodeDef is deprecated. Please use tf.compat.v1.NodeDef instead.

W0623 14:46:35.693674 47187556010368 module_wrapper.py:139] From /work/home/hepj/tf1/transformer-xl-master/tf/gpu_utils.py:6: The name tf.NodeDef is deprecated. Please use tf.compat.v1.NodeDef instead.

WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W0623 14:46:35.704414 47187556010368 module_wrapper.py:139] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:416: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.

W0623 14:46:35.742786 47187556010368 module_wrapper.py:139] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:416: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.

WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:493: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
W0623 14:46:35.775503 47187556010368 deprecation.py:323] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:493: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0623 14:46:35.776226 47187556010368 deprecation.py:323] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:54: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W0623 14:46:35.801217 47187556010368 deprecation.py:323] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:54: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0623 14:46:36.060614 47187556010368 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From train_gpu_test.py:194: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W0623 14:46:40.507537 47187556010368 module_wrapper.py:139] From train_gpu_test.py:194: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

INFO:tensorflow:#params: 41055436
I0623 14:46:40.517497 47187556010368 train_gpu_test.py:195] #params: 41055436
INFO:tensorflow:#params: 41055436
I0623 14:46:49.611661 47187556010368 train_gpu_test.py:195] #params: 41055436
INFO:tensorflow:#params: 41055436
I0623 14:46:58.740740 47187556010368 train_gpu_test.py:195] #params: 41055436
INFO:tensorflow:#params: 41055436
I0623 14:47:08.116391 47187556010368 train_gpu_test.py:195] #params: 41055436
WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0623 14:47:13.709527 47187556010368 deprecation.py:323] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From train_gpu_test.py:292: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

W0623 14:47:13.892564 47187556010368 module_wrapper.py:139] From train_gpu_test.py:292: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From train_gpu_test.py:302: The name tf.train.cosine_decay is deprecated. Please use tf.compat.v1.train.cosine_decay instead.

W0623 14:47:13.896909 47187556010368 module_wrapper.py:139] From train_gpu_test.py:302: The name tf.train.cosine_decay is deprecated. Please use tf.compat.v1.train.cosine_decay instead.

WARNING:tensorflow:From train_gpu_test.py:313: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0623 14:47:13.910406 47187556010368 module_wrapper.py:139] From train_gpu_test.py:313: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From train_gpu_test.py:323: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0623 14:47:15.554019 47187556010368 module_wrapper.py:139] From train_gpu_test.py:323: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From train_gpu_test.py:325: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0623 14:47:15.950821 47187556010368 module_wrapper.py:139] From train_gpu_test.py:325: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From train_gpu_test.py:325: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0623 14:47:15.951256 47187556010368 module_wrapper.py:139] From train_gpu_test.py:325: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2022-06-23 14:47:15.951746: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2022-06-23 14:47:16.345128: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1999880000 Hz
2022-06-23 14:47:16.347222: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1399f370 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-06-23 14:47:16.347347: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-06-23 14:47:16.376028: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libamdhip64.so
2022-06-23 14:47:20.574412: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d549690 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2022-06-23 14:47:20.574543: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): C878180, AMDGPU ISA version: gfx906
2022-06-23 14:47:20.574586: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): C878180, AMDGPU ISA version: gfx906
2022-06-23 14:47:20.574625: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): C878180, AMDGPU ISA version: gfx906
2022-06-23 14:47:20.574663: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): C878180, AMDGPU ISA version: gfx906
2022-06-23 14:47:20.581876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 0 with properties: 
name: C878180
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.319
pciBusID 0000:04:00.0
2022-06-23 14:47:20.582075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 1 with properties: 
name: C878180
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.319
pciBusID 0000:26:00.0
2022-06-23 14:47:20.582181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 2 with properties: 
name: C878180
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.319
pciBusID 0000:43:00.0
2022-06-23 14:47:20.582264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 3 with properties: 
name: C878180
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.319
pciBusID 0000:63:00.0
2022-06-23 14:47:23.159813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2022-06-23 14:47:23.222323: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2022-06-23 14:48:25.788779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2022-06-23 14:48:25.890072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2022-06-23 14:48:25.890632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
2022-06-23 14:48:25.890804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-23 14:48:25.890868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3 
2022-06-23 14:48:25.890934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y 
2022-06-23 14:48:25.890975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y 
2022-06-23 14:48:25.891013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y 
2022-06-23 14:48:25.891050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N 
2022-06-23 14:48:25.891650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14001 MB memory) -> physical GPU (device: 0, name: C878180, pci bus id: 0000:04:00.0)
2022-06-23 14:48:25.899617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 14001 MB memory) -> physical GPU (device: 1, name: C878180, pci bus id: 0000:26:00.0)
2022-06-23 14:48:25.913932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 14001 MB memory) -> physical GPU (device: 2, name: C878180, pci bus id: 0000:43:00.0)
2022-06-23 14:48:25.922425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 14001 MB memory) -> physical GPU (device: 3, name: C878180, pci bus id: 0000:63:00.0)
WARNING:tensorflow:From train_gpu_test.py:326: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

W0623 14:48:25.955481 47187556010368 module_wrapper.py:139] From train_gpu_test.py:326: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

2022-06-23 14:48:29.635470: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled.
2022-06-23 14:48:29.853604: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682108416 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.853823: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213896704 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.853928: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892506624 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854018: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703255552 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854108: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632929792 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854214: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669636608 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854311: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802672640 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854409: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022405120 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854502: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320164352 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854589: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688147968 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854702: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119332864 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854792: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607399424 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.854922: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146659328 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855037: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731993344 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855140: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358793984 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855247: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022914560 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855350: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720623104 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855448: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448560640 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855561: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203704576 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855672: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983334144 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855761: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785000704 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855880: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606500608 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.855996: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445850624 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.856110: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301265664 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.856201: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171139072 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:29.856310: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054025216 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.325078: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled.
2022-06-23 14:48:54.519871: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682124032 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520098: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213911040 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520185: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892519936 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520269: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703267840 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520352: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632941056 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520434: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669646848 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520515: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802681856 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520596: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022413312 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520677: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320172032 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520757: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688154624 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520838: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119339008 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.520931: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607405056 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521011: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146664448 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521100: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731997952 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521181: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358798080 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521262: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022918144 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521355: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720626176 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521436: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448563456 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521516: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203707136 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521597: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983336448 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521677: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785002752 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521756: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606502400 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521836: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445852160 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.521925: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301266944 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.522006: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171140352 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.522086: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054026496 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.550346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2022-06-23 14:48:54.751225: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682116096 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751450: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213903872 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751537: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892512768 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751620: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703261696 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751702: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632934912 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751797: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669640704 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751887: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802676224 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.751969: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022408192 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752049: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320167424 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752129: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688150528 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752209: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119335424 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752299: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607401984 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752379: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146661632 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752460: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731995392 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752540: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358795776 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752621: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022916096 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752701: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720624384 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752780: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448561920 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752868: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203705600 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.752950: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983335168 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.753033: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785001728 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.753114: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606501632 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.753194: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445851392 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.753274: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301266176 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.753353: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171139584 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.753433: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054025728 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.979658: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682108416 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.979903: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213896704 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.979991: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892506624 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980075: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703255552 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980159: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632929792 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980239: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669636608 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980320: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802672640 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980401: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022405120 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980482: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320164352 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980562: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688147968 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980656: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119332864 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980750: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607399424 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980831: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146659328 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.980920: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731993344 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981001: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358793984 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981081: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022914560 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981161: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720623104 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981241: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448560640 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981320: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203704576 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981400: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983334144 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981479: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785000704 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981559: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606500608 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981639: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445850624 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981719: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301265664 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981800: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171139072 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:54.981887: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054025216 bytes) from device: HIP_ERROR_OutOfMemory
2022-06-23 14:48:55.422345: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled.
2022-06-23 14:48:55.425431: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled.
Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: 
error
hipErrorInvalidKernelFile
/work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co

Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: 
error
hipErrorInvalidKernelFile
/work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co

Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: 
error
hipErrorInvalidKernelFile
/work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co

Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: 
error
hipErrorInvalidKernelFile
/work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co


rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f32_r", b_type: "f32_r", c_type: "f32_r", d_type: "f32_r", compute_type: "f32_r", transA: 'N', transB: 'N', M: 1536, N: 3072, K: 512, alpha: 1, row_stride_a: 1, col_stride_a: 1536, row_stride_b: 1, col_stride_b: 512, row_stride_c: 1, col_stride_c: 1536, row_stride_d: 1, col_stride_d: 1536, beta: 0, batch_count: 1, strided_batch: true, stride_a: 0, stride_b: 0, stride_c: 0, stride_d: 0, atomics_mode: atomics_not_allowed }
Kernel Cijk_Ailk_Bljk_SB_MT128x64x16_SN_APM1_AF0EM1_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM1_BL1_DTL0_ETSP_EPS0_FL0_GRVW4_GSU1_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MAC_MDA2_NLCA1_NLCB1_ONLL1_PK0_PGR0_PLR1_RK0_SU32_SUM0_SUS256_SVW4_SNLL0_TT8_4_USFGRO0_VAW1_VS1_VW4_WG16_16_1_WGM1 not found in any loaded module.
This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set.
2022-06-23 14:49:05.193083: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.193084: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.193098: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.193602: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.193683: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.193756: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.193945: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.194235: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.194547: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.194827: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.195128: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.194451: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.195339: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.195407: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.195598: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.195656: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.195961: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.196166: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.196230: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.196526: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.196709: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.196821: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.196884: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.197241: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.197292: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.197748: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.198108: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.198415: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.198601: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.198852: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error
2022-06-23 14:49:05.906378: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
==================================================
/work/home/hepj/tf1/transformer-xl-master/data/enwik8//tfrecords/record_info-train.bsz-12.tlen-512.json
==================================================
Traceback (most recent call last):
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512
	 [[{{node transformer_1/layer_2/rel_attn/r/Tensordot/MatMul}}]]
  (1) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512
	 [[{{node transformer/layer_2/rel_attn/r/Tensordot/MatMul}}]]
0 successful operations.
3 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_gpu_test.py", line 492, in <module>
    tf.app.run()
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train_gpu_test.py", line 486, in main
    train(n_token, cutoffs, "/gpu:0")
  File "train_gpu_test.py", line 341, in train
    fetched = sess.run(fetches, feed_dict=feed_dict)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512
	 [[node transformer_1/layer_2/rel_attn/r/Tensordot/MatMul (defined at /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512
	 [[node transformer/layer_2/rel_attn/r/Tensordot/MatMul (defined at /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
3 derived errors ignored.

Original stack trace for 'transformer_1/layer_2/rel_attn/r/Tensordot/MatMul':
  File "train_gpu_test.py", line 492, in <module>
    tf.app.run()
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train_gpu_test.py", line 486, in main
    train(n_token, cutoffs, "/gpu:0")
  File "train_gpu_test.py", line 271, in train
    mems=mems_i)
  File "train_gpu_test.py", line 223, in single_core_graph
    is_training=is_training)
  File "train_gpu_test.py", line 191, in model_fn
    proj_same_dim=FLAGS.proj_same_dim)
  File "/work/home/hepj/tf1/transformer-xl-master/tf/model.py", line 517, in transformer
    kernel_initializer=initializer)
  File "/work/home/hepj/tf1/transformer-xl-master/tf/model.py", line 56, in rel_multihead_attn
    kernel_initializer=kernel_initializer, name='r')
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py", line 187, in dense
    return layer.apply(inputs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
    return converted_call(f, options, args, kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
    return f(*args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/core.py", line 1039, in call
    outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4096, in tensordot
    ab_matmul = matmul(a_reshape, b_reshape)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2754, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6236, in mat_mul
    name=name)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()