Run training... WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation. WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead. WARNING:tensorflow:From train_gpu_test.py:23: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead. WARNING:tensorflow:From train_gpu_test.py:492: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead. WARNING:tensorflow:From train_gpu_test.py:492: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead. WARNING:tensorflow:From train_gpu_test.py:482: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead. W0623 14:46:20.521119 47187556010368 module_wrapper.py:139] From train_gpu_test.py:482: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead. INFO:tensorflow:n_token 204 I0623 14:46:20.521361 47187556010368 train_gpu_test.py:482] n_token 204 INFO:tensorflow:[train] File names ['train.bsz-12.tlen-512.tfrecords'] I0623 14:46:20.531896 47187556010368 data_utils.py:434] [train] File names ['train.bsz-12.tlen-512.tfrecords'] INFO:tensorflow:num of batches 14483 I0623 14:46:20.532083 47187556010368 train_gpu_test.py:240] num of batches 14483 WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.sparse_tensor_to_dense is deprecated. Please use tf.sparse.to_dense instead. W0623 14:46:34.696085 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.sparse_tensor_to_dense is deprecated. Please use tf.sparse.to_dense instead. WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead. W0623 14:46:34.697554 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead. WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead. W0623 14:46:34.698294 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead. WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead. W0623 14:46:34.701079 47187556010368 module_wrapper.py:139] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead. WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/data_utils.py:506: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.cast` instead. W0623 14:46:35.660339 47187556010368 deprecation.py:323] From /work/home/hepj/tf1/transformer-xl-master/tf/data_utils.py:506: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.cast` instead. WARNING:tensorflow:From train_gpu_test.py:247: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`. W0623 14:46:35.673759 47187556010368 deprecation.py:323] From train_gpu_test.py:247: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`. WARNING:tensorflow:From train_gpu_test.py:259: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. W0623 14:46:35.692192 47187556010368 module_wrapper.py:139] From train_gpu_test.py:259: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. WARNING:tensorflow:From train_gpu_test.py:259: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead. W0623 14:46:35.692516 47187556010368 module_wrapper.py:139] From train_gpu_test.py:259: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead. WARNING:tensorflow:From train_gpu_test.py:263: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. W0623 14:46:35.692863 47187556010368 module_wrapper.py:139] From train_gpu_test.py:263: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/gpu_utils.py:6: The name tf.NodeDef is deprecated. Please use tf.compat.v1.NodeDef instead. W0623 14:46:35.693674 47187556010368 module_wrapper.py:139] From /work/home/hepj/tf1/transformer-xl-master/tf/gpu_utils.py:6: The name tf.NodeDef is deprecated. Please use tf.compat.v1.NodeDef instead. WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. W0623 14:46:35.704414 47187556010368 module_wrapper.py:139] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:416: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead. W0623 14:46:35.742786 47187556010368 module_wrapper.py:139] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:416: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead. WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:493: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dropout instead. W0623 14:46:35.775503 47187556010368 deprecation.py:323] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:493: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dropout instead. WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use `layer.__call__` method instead. W0623 14:46:35.776226 47187556010368 deprecation.py:323] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use `layer.__call__` method instead. WARNING:tensorflow:From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:54: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.Dense instead. W0623 14:46:35.801217 47187556010368 deprecation.py:323] From /work/home/hepj/tf1/transformer-xl-master/tf/model.py:54: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.Dense instead. WARNING:tensorflow: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue. W0623 14:46:36.060614 47187556010368 lazy_loader.py:50] The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue. WARNING:tensorflow:From train_gpu_test.py:194: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead. W0623 14:46:40.507537 47187556010368 module_wrapper.py:139] From train_gpu_test.py:194: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead. INFO:tensorflow:#params: 41055436 I0623 14:46:40.517497 47187556010368 train_gpu_test.py:195] #params: 41055436 INFO:tensorflow:#params: 41055436 I0623 14:46:49.611661 47187556010368 train_gpu_test.py:195] #params: 41055436 INFO:tensorflow:#params: 41055436 I0623 14:46:58.740740 47187556010368 train_gpu_test.py:195] #params: 41055436 INFO:tensorflow:#params: 41055436 I0623 14:47:08.116391 47187556010368 train_gpu_test.py:195] #params: 41055436 WARNING:tensorflow:From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where W0623 14:47:13.709527 47187556010368 deprecation.py:323] From /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From train_gpu_test.py:292: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. W0623 14:47:13.892564 47187556010368 module_wrapper.py:139] From train_gpu_test.py:292: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. WARNING:tensorflow:From train_gpu_test.py:302: The name tf.train.cosine_decay is deprecated. Please use tf.compat.v1.train.cosine_decay instead. W0623 14:47:13.896909 47187556010368 module_wrapper.py:139] From train_gpu_test.py:302: The name tf.train.cosine_decay is deprecated. Please use tf.compat.v1.train.cosine_decay instead. WARNING:tensorflow:From train_gpu_test.py:313: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. W0623 14:47:13.910406 47187556010368 module_wrapper.py:139] From train_gpu_test.py:313: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. WARNING:tensorflow:From train_gpu_test.py:323: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead. W0623 14:47:15.554019 47187556010368 module_wrapper.py:139] From train_gpu_test.py:323: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead. WARNING:tensorflow:From train_gpu_test.py:325: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. W0623 14:47:15.950821 47187556010368 module_wrapper.py:139] From train_gpu_test.py:325: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. WARNING:tensorflow:From train_gpu_test.py:325: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. W0623 14:47:15.951256 47187556010368 module_wrapper.py:139] From train_gpu_test.py:325: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. 2022-06-23 14:47:15.951746: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2022-06-23 14:47:16.345128: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1999880000 Hz 2022-06-23 14:47:16.347222: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1399f370 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-06-23 14:47:16.347347: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-06-23 14:47:16.376028: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libamdhip64.so 2022-06-23 14:47:20.574412: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d549690 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices: 2022-06-23 14:47:20.574543: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): C878180, AMDGPU ISA version: gfx906 2022-06-23 14:47:20.574586: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): C878180, AMDGPU ISA version: gfx906 2022-06-23 14:47:20.574625: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): C878180, AMDGPU ISA version: gfx906 2022-06-23 14:47:20.574663: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): C878180, AMDGPU ISA version: gfx906 2022-06-23 14:47:20.581876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 0 with properties: name: C878180 AMDGPU ISA: gfx906 memoryClockRate (GHz) 1.319 pciBusID 0000:04:00.0 2022-06-23 14:47:20.582075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 1 with properties: name: C878180 AMDGPU ISA: gfx906 memoryClockRate (GHz) 1.319 pciBusID 0000:26:00.0 2022-06-23 14:47:20.582181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 2 with properties: name: C878180 AMDGPU ISA: gfx906 memoryClockRate (GHz) 1.319 pciBusID 0000:43:00.0 2022-06-23 14:47:20.582264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1650] Found device 3 with properties: name: C878180 AMDGPU ISA: gfx906 memoryClockRate (GHz) 1.319 pciBusID 0000:63:00.0 2022-06-23 14:47:23.159813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so 2022-06-23 14:47:23.222323: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so 2022-06-23 14:48:25.788779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so 2022-06-23 14:48:25.890072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so 2022-06-23 14:48:25.890632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3 2022-06-23 14:48:25.890804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-06-23 14:48:25.890868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 1 2 3 2022-06-23 14:48:25.890934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N Y Y Y 2022-06-23 14:48:25.890975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1: Y N Y Y 2022-06-23 14:48:25.891013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2: Y Y N Y 2022-06-23 14:48:25.891050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3: Y Y Y N 2022-06-23 14:48:25.891650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14001 MB memory) -> physical GPU (device: 0, name: C878180, pci bus id: 0000:04:00.0) 2022-06-23 14:48:25.899617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 14001 MB memory) -> physical GPU (device: 1, name: C878180, pci bus id: 0000:26:00.0) 2022-06-23 14:48:25.913932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 14001 MB memory) -> physical GPU (device: 2, name: C878180, pci bus id: 0000:43:00.0) 2022-06-23 14:48:25.922425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 14001 MB memory) -> physical GPU (device: 3, name: C878180, pci bus id: 0000:63:00.0) WARNING:tensorflow:From train_gpu_test.py:326: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead. W0623 14:48:25.955481 47187556010368 module_wrapper.py:139] From train_gpu_test.py:326: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead. 2022-06-23 14:48:29.635470: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled. 2022-06-23 14:48:29.853604: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682108416 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.853823: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213896704 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.853928: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892506624 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854018: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703255552 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854108: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632929792 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854214: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669636608 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854311: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802672640 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854409: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022405120 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854502: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320164352 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854589: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688147968 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854702: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119332864 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854792: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607399424 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.854922: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146659328 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855037: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731993344 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855140: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358793984 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855247: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022914560 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855350: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720623104 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855448: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448560640 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855561: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203704576 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855672: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983334144 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855761: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785000704 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855880: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606500608 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.855996: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445850624 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.856110: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301265664 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.856201: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171139072 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:29.856310: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054025216 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.325078: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled. 2022-06-23 14:48:54.519871: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682124032 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520098: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213911040 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520185: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892519936 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520269: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703267840 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520352: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632941056 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520434: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669646848 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520515: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802681856 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520596: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022413312 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520677: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320172032 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520757: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688154624 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520838: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119339008 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.520931: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607405056 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521011: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146664448 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521100: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731997952 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521181: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358798080 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521262: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022918144 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521355: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720626176 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521436: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448563456 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521516: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203707136 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521597: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983336448 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521677: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785002752 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521756: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606502400 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521836: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445852160 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.521925: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301266944 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.522006: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171140352 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.522086: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054026496 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.550346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so 2022-06-23 14:48:54.751225: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682116096 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751450: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213903872 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751537: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892512768 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751620: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703261696 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751702: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632934912 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751797: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669640704 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751887: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802676224 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.751969: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022408192 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752049: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320167424 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752129: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688150528 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752209: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119335424 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752299: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607401984 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752379: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146661632 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752460: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731995392 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752540: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358795776 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752621: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022916096 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752701: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720624384 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752780: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448561920 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752868: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203705600 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.752950: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983335168 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.753033: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785001728 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.753114: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606501632 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.753194: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445851392 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.753274: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301266176 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.753353: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171139584 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.753433: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054025728 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.979658: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 13.67G (14682108416 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.979903: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 12.31G (13213896704 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.979991: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 11.08G (11892506624 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980075: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 9.97G (10703255552 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980159: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.97G (9632929792 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980239: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 8.07G (8669636608 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980320: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 7.27G (7802672640 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980401: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 6.54G (7022405120 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980482: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.89G (6320164352 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980562: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 5.30G (5688147968 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980656: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.77G (5119332864 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980750: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 4.29G (4607399424 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980831: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.86G (4146659328 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.980920: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.48G (3731993344 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981001: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 3.13G (3358793984 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981081: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.81G (3022914560 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981161: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.53G (2720623104 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981241: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.28G (2448560640 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981320: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 2.05G (2203704576 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981400: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.85G (1983334144 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981479: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.66G (1785000704 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981559: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.50G (1606500608 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981639: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.35G (1445850624 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981719: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.21G (1301265664 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981800: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1.09G (1171139072 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:54.981887: E tensorflow/stream_executor/rocm/rocm_driver.cc:645] failed to allocate 1005.20M (1054025216 bytes) from device: HIP_ERROR_OutOfMemory 2022-06-23 14:48:55.422345: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled. 2022-06-23 14:48:55.425431: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled. Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: error hipErrorInvalidKernelFile /work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: error hipErrorInvalidKernelFile /work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: error hipErrorInvalidKernelFile /work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co Error 218(hipErrorInvalidKernelFile) /data/jenkins_workspace/workspace/rocBLAS_release/rocblas/build/release/virtualenv/lib64/python3.6/site-packages/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:84: error hipErrorInvalidKernelFile /work/home/hepj/app/dtk-22.04.1/rocblas/lib/library_dcu2/TensileLibrary_gfx906.co rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f32_r", b_type: "f32_r", c_type: "f32_r", d_type: "f32_r", compute_type: "f32_r", transA: 'N', transB: 'N', M: 1536, N: 3072, K: 512, alpha: 1, row_stride_a: 1, col_stride_a: 1536, row_stride_b: 1, col_stride_b: 512, row_stride_c: 1, col_stride_c: 1536, row_stride_d: 1, col_stride_d: 1536, beta: 0, batch_count: 1, strided_batch: true, stride_a: 0, stride_b: 0, stride_c: 0, stride_d: 0, atomics_mode: atomics_not_allowed } Kernel Cijk_Ailk_Bljk_SB_MT128x64x16_SN_APM1_AF0EM1_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM1_BL1_DTL0_ETSP_EPS0_FL0_GRVW4_GSU1_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MAC_MDA2_NLCA1_NLCB1_ONLL1_PK0_PGR0_PLR1_RK0_SU32_SUM0_SUS256_SVW4_SNLL0_TT8_4_USFGRO0_VAW1_VS1_VW4_WG16_16_1_WGM1 not found in any loaded module. This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set. 2022-06-23 14:49:05.193083: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.193084: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.193098: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.193602: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.193683: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.193756: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.193945: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.194235: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.194547: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.194827: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.195128: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.194451: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.195339: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.195407: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.195598: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.195656: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.195961: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.196166: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.196230: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.196526: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.196709: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.196821: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.196884: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.197241: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.197292: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.197748: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.198108: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.198415: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.198601: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.198852: E tensorflow/stream_executor/rocm/rocm_blas.cc:416] failed to run ROCBLAS routine rocblas_sgemm: rocblas_status_internal_error 2022-06-23 14:49:05.906378: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. ================================================== /work/home/hepj/tf1/transformer-xl-master/data/enwik8//tfrecords/record_info-train.bsz-12.tlen-512.json ================================================== Traceback (most recent call last): File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512 [[{{node transformer_1/layer_2/rel_attn/r/Tensordot/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512 [[{{node transformer/layer_2/rel_attn/r/Tensordot/MatMul}}]] 0 successful operations. 3 derived errors ignored. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "train_gpu_test.py", line 492, in tf.app.run() File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train_gpu_test.py", line 486, in main train(n_token, cutoffs, "/gpu:0") File "train_gpu_test.py", line 341, in train fetched = sess.run(fetches, feed_dict=feed_dict) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512 [[node transformer_1/layer_2/rel_attn/r/Tensordot/MatMul (defined at /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] (1) Internal: Blas GEMM launch failed : a.shape=(1024, 512), b.shape=(512, 512), m=1024, n=512, k=512 [[node transformer/layer_2/rel_attn/r/Tensordot/MatMul (defined at /work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 3 derived errors ignored. Original stack trace for 'transformer_1/layer_2/rel_attn/r/Tensordot/MatMul': File "train_gpu_test.py", line 492, in tf.app.run() File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train_gpu_test.py", line 486, in main train(n_token, cutoffs, "/gpu:0") File "train_gpu_test.py", line 271, in train mems=mems_i) File "train_gpu_test.py", line 223, in single_core_graph is_training=is_training) File "train_gpu_test.py", line 191, in model_fn proj_same_dim=FLAGS.proj_same_dim) File "/work/home/hepj/tf1/transformer-xl-master/tf/model.py", line 517, in transformer kernel_initializer=initializer) File "/work/home/hepj/tf1/transformer-xl-master/tf/model.py", line 56, in rel_multihead_attn kernel_initializer=kernel_initializer, name='r') File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py", line 187, in dense return layer.apply(inputs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply return self.__call__(inputs, *args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__ outputs = super(Layer, self).__call__(inputs, *args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__ outputs = call_fn(cast_inputs, *args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper return converted_call(f, options, args, kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call return _call_unconverted(f, args, kwargs, options) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted return f(*args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/core.py", line 1039, in call outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]]) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4096, in tensordot ab_matmul = matmul(a_reshape, b_reshape) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2754, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6236, in mat_mul name=name) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/work/home/hepj/.pyenv/versions/tf1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__ self._traceback = tf_stack.extract_stack()