WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.787012 20351 ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
I1109 17:32:11.786959 18664 ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.787282 20352 ProcessGroupNCCL.cpp:835] [Rank 3] NCCL watchdog thread started!
I1109 17:32:11.787258 18666 ProcessGroupNCCL.cpp:669] [Rank 3] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.787432 20353 ProcessGroupNCCL.cpp:835] [Rank 1] NCCL watchdog thread started!
I1109 17:32:11.787400 18663 ProcessGroupNCCL.cpp:669] [Rank 1] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.787524 18665 ProcessGroupNCCL.cpp:669] [Rank 2] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.787559 20354 ProcessGroupNCCL.cpp:835] [Rank 2] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.788462 32398 ProcessGroupNCCL.cpp:669] [Rank 65] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.788563  1315 ProcessGroupNCCL.cpp:835] [Rank 65] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.788479 32396 ProcessGroupNCCL.cpp:669] [Rank 64] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.788592  1316 ProcessGroupNCCL.cpp:835] [Rank 64] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.788831  1317 ProcessGroupNCCL.cpp:835] [Rank 66] NCCL watchdog thread started!
I1109 17:32:11.788806 32395 ProcessGroupNCCL.cpp:669] [Rank 66] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.789428 32397 ProcessGroupNCCL.cpp:669] [Rank 67] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.789516  1318 ProcessGroupNCCL.cpp:835] [Rank 67] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.835148  1946 ProcessGroupNCCL.cpp:835] [Rank 47] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.836906  1944 ProcessGroupNCCL.cpp:835] [Rank 46] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.837581  1945 ProcessGroupNCCL.cpp:835] [Rank 45] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.837880  1943 ProcessGroupNCCL.cpp:835] [Rank 44] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.862900 28961 ProcessGroupNCCL.cpp:835] [Rank 18] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.860137  8802 ProcessGroupNCCL.cpp:835] [Rank 89] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.864642 28963 ProcessGroupNCCL.cpp:835] [Rank 19] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.864675 28962 ProcessGroupNCCL.cpp:835] [Rank 16] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.864917 28960 ProcessGroupNCCL.cpp:835] [Rank 17] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.862298  8800 ProcessGroupNCCL.cpp:835] [Rank 88] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.867502  8889 ProcessGroupNCCL.cpp:835] [Rank 35] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.867841  1166 ProcessGroupNCCL.cpp:835] [Rank 94] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.867995  1164 ProcessGroupNCCL.cpp:835] [Rank 92] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.872861  5798 ProcessGroupNCCL.cpp:835] [Rank 63] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.865708 21966 ProcessGroupNCCL.cpp:835] [Rank 84] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.872879  5796 ProcessGroupNCCL.cpp:835] [Rank 61] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.865746 21964 ProcessGroupNCCL.cpp:835] [Rank 86] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.868957  1167 ProcessGroupNCCL.cpp:835] [Rank 95] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.869498  8891 ProcessGroupNCCL.cpp:835] [Rank 32] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.866041 21965 ProcessGroupNCCL.cpp:835] [Rank 85] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.866421 16145 ProcessGroupNCCL.cpp:835] [Rank 72] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.867938 29324 ProcessGroupNCCL.cpp:835] [Rank 39] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.866178 21963 ProcessGroupNCCL.cpp:835] [Rank 87] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.873348  5797 ProcessGroupNCCL.cpp:835] [Rank 60] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.873979  5186 ProcessGroupNCCL.cpp:835] [Rank 22] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.869545  1165 ProcessGroupNCCL.cpp:835] [Rank 93] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.873389 18354 ProcessGroupNCCL.cpp:835] [Rank 50] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.874648 20008 ProcessGroupNCCL.cpp:835] [Rank 12] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.868005 16147 ProcessGroupNCCL.cpp:835] [Rank 73] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.871151  8890 ProcessGroupNCCL.cpp:835] [Rank 34] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.872207 11736 ProcessGroupNCCL.cpp:835] [Rank 83] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.874883 18352 ProcessGroupNCCL.cpp:835] [Rank 49] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875619  5185 ProcessGroupNCCL.cpp:835] [Rank 20] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875175  5795 ProcessGroupNCCL.cpp:835] [Rank 62] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875759  5187 ProcessGroupNCCL.cpp:835] [Rank 21] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875882 27599 ProcessGroupNCCL.cpp:835] [Rank 24] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875978 27598 ProcessGroupNCCL.cpp:835] [Rank 26] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875635 18353 ProcessGroupNCCL.cpp:835] [Rank 48] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.870266 28543 ProcessGroupNCCL.cpp:835] [Rank 71] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.876502 27600 ProcessGroupNCCL.cpp:835] [Rank 27] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.869139  8801 ProcessGroupNCCL.cpp:835] [Rank 91] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.876945 27597 ProcessGroupNCCL.cpp:835] [Rank 25] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.876405 18355 ProcessGroupNCCL.cpp:835] [Rank 51] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877225  5188 ProcessGroupNCCL.cpp:835] [Rank 23] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.869225 28416 ProcessGroupNCCL.cpp:835] [Rank 4] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.869176 12770 ProcessGroupNCCL.cpp:835] [Rank 76] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877326  9214 ProcessGroupNCCL.cpp:835] [Rank 28] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877411  9212 ProcessGroupNCCL.cpp:835] [Rank 31] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875149 11735 ProcessGroupNCCL.cpp:835] [Rank 80] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877018 21465 ProcessGroupNCCL.cpp:835] [Rank 57] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875082 17487 ProcessGroupNCCL.cpp:835] [Rank 11] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.870350 12771 ProcessGroupNCCL.cpp:835] [Rank 79] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.878412  9213 ProcessGroupNCCL.cpp:835] [Rank 30] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.870505 12772 ProcessGroupNCCL.cpp:835] [Rank 77] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877517 21464 ProcessGroupNCCL.cpp:835] [Rank 59] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.871263 28415 ProcessGroupNCCL.cpp:835] [Rank 5] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.876204 11733 ProcessGroupNCCL.cpp:835] [Rank 82] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.879371 20006 ProcessGroupNCCL.cpp:835] [Rank 15] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.878505 21466 ProcessGroupNCCL.cpp:835] [Rank 58] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.873036 16146 ProcessGroupNCCL.cpp:835] [Rank 74] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.873725  8799 ProcessGroupNCCL.cpp:835] [Rank 90] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.875123 28542 ProcessGroupNCCL.cpp:835] [Rank 70] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.878775 17488 ProcessGroupNCCL.cpp:835] [Rank 10] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.876260 29322 ProcessGroupNCCL.cpp:835] [Rank 37] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.879299 17486 ProcessGroupNCCL.cpp:835] [Rank 9] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.879406 11734 ProcessGroupNCCL.cpp:835] [Rank 81] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.874640 28417 ProcessGroupNCCL.cpp:835] [Rank 7] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.883296 20007 ProcessGroupNCCL.cpp:835] [Rank 14] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.882755  3911 ProcessGroupNCCL.cpp:835] [Rank 41] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877713 16144 ProcessGroupNCCL.cpp:835] [Rank 75] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.877140 12769 ProcessGroupNCCL.cpp:835] [Rank 78] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.880501 29323 ProcessGroupNCCL.cpp:835] [Rank 38] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.881523 28541 ProcessGroupNCCL.cpp:835] [Rank 69] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.884120  8888 ProcessGroupNCCL.cpp:835] [Rank 33] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.888684  9211 ProcessGroupNCCL.cpp:835] [Rank 29] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.885823 29321 ProcessGroupNCCL.cpp:835] [Rank 36] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.883777 28414 ProcessGroupNCCL.cpp:835] [Rank 6] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.890960  3910 ProcessGroupNCCL.cpp:835] [Rank 40] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.886875 28540 ProcessGroupNCCL.cpp:835] [Rank 68] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.896919  3912 ProcessGroupNCCL.cpp:835] [Rank 43] NCCL watchdog thread started!
I1109 17:32:11.894495 32369 ProcessGroupNCCL.cpp:669] [Rank 93] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899181  3627 ProcessGroupNCCL.cpp:669] [Rank 23] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.891537 20870 ProcessGroupNCCL.cpp:669] [Rank 84] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899199  3630 ProcessGroupNCCL.cpp:669] [Rank 22] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.891541 20872 ProcessGroupNCCL.cpp:669] [Rank 86] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.894497 32370 ProcessGroupNCCL.cpp:669] [Rank 92] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899215  3629 ProcessGroupNCCL.cpp:669] [Rank 21] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.891561 20873 ProcessGroupNCCL.cpp:669] [Rank 85] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.894515 32368 ProcessGroupNCCL.cpp:669] [Rank 95] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899230  3628 ProcessGroupNCCL.cpp:669] [Rank 20] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.891572 20871 ProcessGroupNCCL.cpp:669] [Rank 87] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.894526 32367 ProcessGroupNCCL.cpp:669] [Rank 94] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899583  4185 ProcessGroupNCCL.cpp:669] [Rank 62] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899597  4186 ProcessGroupNCCL.cpp:669] [Rank 61] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899607  4183 ProcessGroupNCCL.cpp:669] [Rank 63] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.899621  4184 ProcessGroupNCCL.cpp:669] [Rank 60] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.901499  3909 ProcessGroupNCCL.cpp:835] [Rank 42] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.903362 20005 ProcessGroupNCCL.cpp:835] [Rank 13] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.903832 21463 ProcessGroupNCCL.cpp:835] [Rank 56] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.902930 17485 ProcessGroupNCCL.cpp:835] [Rank 8] NCCL watchdog thread started!
I1109 17:32:11.914777 16951 ProcessGroupNCCL.cpp:669] [Rank 50] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.914790 16953 ProcessGroupNCCL.cpp:669] [Rank 51] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.914803 16952 ProcessGroupNCCL.cpp:669] [Rank 48] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.914813 16954 ProcessGroupNCCL.cpp:669] [Rank 49] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.914288  8710 ProcessGroupNCCL.cpp:835] [Rank 54] NCCL watchdog thread started!
I1109 17:32:11.924491   499 ProcessGroupNCCL.cpp:669] [Rank 44] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.924504   502 ProcessGroupNCCL.cpp:669] [Rank 47] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.924520   500 ProcessGroupNCCL.cpp:669] [Rank 45] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.924528   498 ProcessGroupNCCL.cpp:669] [Rank 46] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.917910  7394 ProcessGroupNCCL.cpp:669] [Rank 91] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.917922  7395 ProcessGroupNCCL.cpp:669] [Rank 89] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.917925  7397 ProcessGroupNCCL.cpp:669] [Rank 90] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.917932  7396 ProcessGroupNCCL.cpp:669] [Rank 88] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.918546 27304 ProcessGroupNCCL.cpp:669] [Rank 4] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.918557 27303 ProcessGroupNCCL.cpp:669] [Rank 5] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.920086  8707 ProcessGroupNCCL.cpp:835] [Rank 52] NCCL watchdog thread started!
I1109 17:32:11.918655 27305 ProcessGroupNCCL.cpp:669] [Rank 7] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.918668 27302 ProcessGroupNCCL.cpp:669] [Rank 6] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.923264 27608 ProcessGroupNCCL.cpp:669] [Rank 18] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.923282 27607 ProcessGroupNCCL.cpp:669] [Rank 19] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.923296 27606 ProcessGroupNCCL.cpp:669] [Rank 16] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.923310 27605 ProcessGroupNCCL.cpp:669] [Rank 17] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.928817 26238 ProcessGroupNCCL.cpp:669] [Rank 24] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.928844 26237 ProcessGroupNCCL.cpp:669] [Rank 26] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.928853 26240 ProcessGroupNCCL.cpp:669] [Rank 27] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.928856 26239 ProcessGroupNCCL.cpp:669] [Rank 25] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.922752  8708 ProcessGroupNCCL.cpp:835] [Rank 53] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:32:11.923930  8709 ProcessGroupNCCL.cpp:835] [Rank 55] NCCL watchdog thread started!
I1109 17:32:11.948721 14569 ProcessGroupNCCL.cpp:669] [Rank 74] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.948717 14567 ProcessGroupNCCL.cpp:669] [Rank 73] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.948745 14568 ProcessGroupNCCL.cpp:669] [Rank 72] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.948747 14570 ProcessGroupNCCL.cpp:669] [Rank 75] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.957391  7615 ProcessGroupNCCL.cpp:669] [Rank 55] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.957415  7612 ProcessGroupNCCL.cpp:669] [Rank 53] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.957425  7614 ProcessGroupNCCL.cpp:669] [Rank 52] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.957437  7613 ProcessGroupNCCL.cpp:669] [Rank 54] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.963528 16210 ProcessGroupNCCL.cpp:669] [Rank 11] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.963546 16211 ProcessGroupNCCL.cpp:669] [Rank 8] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.963562 16209 ProcessGroupNCCL.cpp:669] [Rank 10] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.963569 16208 ProcessGroupNCCL.cpp:669] [Rank 9] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965626 20107 ProcessGroupNCCL.cpp:669] [Rank 58] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965652 20109 ProcessGroupNCCL.cpp:669] [Rank 59] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965667 20110 ProcessGroupNCCL.cpp:669] [Rank 57] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965677 20108 ProcessGroupNCCL.cpp:669] [Rank 56] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965902  7788 ProcessGroupNCCL.cpp:669] [Rank 35] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965916  7789 ProcessGroupNCCL.cpp:669] [Rank 34] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965929  7791 ProcessGroupNCCL.cpp:669] [Rank 32] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.965946  7790 ProcessGroupNCCL.cpp:669] [Rank 33] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.967525 10614 ProcessGroupNCCL.cpp:669] [Rank 81] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971107 18587 ProcessGroupNCCL.cpp:669] [Rank 12] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971107 18589 ProcessGroupNCCL.cpp:669] [Rank 15] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971128 18586 ProcessGroupNCCL.cpp:669] [Rank 14] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971133 18588 ProcessGroupNCCL.cpp:669] [Rank 13] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.966792 27455 ProcessGroupNCCL.cpp:669] [Rank 69] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971231  2438 ProcessGroupNCCL.cpp:669] [Rank 40] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.970362 10615 ProcessGroupNCCL.cpp:669] [Rank 82] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.970520 10613 ProcessGroupNCCL.cpp:669] [Rank 80] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.968080 27454 ProcessGroupNCCL.cpp:669] [Rank 70] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.974279  2437 ProcessGroupNCCL.cpp:669] [Rank 41] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.973049 10612 ProcessGroupNCCL.cpp:669] [Rank 83] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.970870 27453 ProcessGroupNCCL.cpp:669] [Rank 68] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971002 27452 ProcessGroupNCCL.cpp:669] [Rank 71] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.977105  2439 ProcessGroupNCCL.cpp:669] [Rank 42] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.977252  2440 ProcessGroupNCCL.cpp:669] [Rank 43] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.971793 11305 ProcessGroupNCCL.cpp:669] [Rank 79] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.972015 11304 ProcessGroupNCCL.cpp:669] [Rank 76] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.973215 11303 ProcessGroupNCCL.cpp:669] [Rank 78] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.979091 11306 ProcessGroupNCCL.cpp:669] [Rank 77] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.991712 27953 ProcessGroupNCCL.cpp:669] [Rank 36] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.997380  8125 ProcessGroupNCCL.cpp:669] [Rank 28] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.997388  8126 ProcessGroupNCCL.cpp:669] [Rank 29] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.991752 27952 ProcessGroupNCCL.cpp:669] [Rank 37] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.997402  8123 ProcessGroupNCCL.cpp:669] [Rank 31] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.991752 27955 ProcessGroupNCCL.cpp:669] [Rank 39] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.997414  8124 ProcessGroupNCCL.cpp:669] [Rank 30] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:32:11.991767 27954 ProcessGroupNCCL.cpp:669] [Rank 38] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
I1109 17:32:17.568461 18664 ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: INFO
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.89s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.89s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.09s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.08s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.09s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.09s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.95s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.95s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 28.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.03s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 28.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.07s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.03s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.15s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.14s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.14s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.15s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.99s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.95s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.02s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.05s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.99s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.01s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.05s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.01s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.11s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.10s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.07s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.03s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.05s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.01s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.10s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.10s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.85s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.05s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.01s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.11s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.10s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.01s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.11s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.10s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.93s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.93s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:27<00:27, 27.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.08s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:28<00:28, 28.08s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.86s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.86s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.86s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.86s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.03s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.03s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.88s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.91s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.00s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.90s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.00s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 22.00s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.90s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.90s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.90s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.88s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.06s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.06s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.06s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.06s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.90s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.88s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.03s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.95s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.90s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.88s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.01s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.88s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.97s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.03s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.05s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.87s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 20.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.03s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.01s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.05s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.01s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.02s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.96s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.05s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.01s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 20.89s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.93s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.05s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.04s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.98s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.01s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.99s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.92s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:44<00:00, 22.00s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:43<00:00, 21.94s/it]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
I1109 17:34:02.045403 21770 ProcessGroupNCCL.cpp:835] [Rank 15] NCCL watchdog thread started!
I1109 17:34:02.045312 18589 ProcessGroupNCCL.cpp:669] [Rank 15] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.045508 18587 ProcessGroupNCCL.cpp:669] [Rank 12] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.045637 21771 ProcessGroupNCCL.cpp:835] [Rank 12] NCCL watchdog thread started!
I1109 17:34:02.045694 21772 ProcessGroupNCCL.cpp:835] [Rank 14] NCCL watchdog thread started!
I1109 17:34:02.045619 18586 ProcessGroupNCCL.cpp:669] [Rank 14] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.045776 18588 ProcessGroupNCCL.cpp:669] [Rank 13] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.045873 21773 ProcessGroupNCCL.cpp:835] [Rank 13] NCCL watchdog thread started!
I1109 17:34:02.059281 26240 ProcessGroupNCCL.cpp:669] [Rank 27] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.059396 29340 ProcessGroupNCCL.cpp:835] [Rank 27] NCCL watchdog thread started!
I1109 17:34:02.059641 26237 ProcessGroupNCCL.cpp:669] [Rank 26] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.059720 29341 ProcessGroupNCCL.cpp:835] [Rank 26] NCCL watchdog thread started!
I1109 17:34:02.059657 26238 ProcessGroupNCCL.cpp:669] [Rank 24] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.059757 29342 ProcessGroupNCCL.cpp:835] [Rank 24] NCCL watchdog thread started!
I1109 17:34:02.059723 26239 ProcessGroupNCCL.cpp:669] [Rank 25] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.059785 29343 ProcessGroupNCCL.cpp:835] [Rank 25] NCCL watchdog thread started!
I1109 17:34:02.052726 16208 ProcessGroupNCCL.cpp:669] [Rank 9] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.052829 19149 ProcessGroupNCCL.cpp:835] [Rank 9] NCCL watchdog thread started!
I1109 17:34:02.052804 16210 ProcessGroupNCCL.cpp:669] [Rank 11] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.052865 16211 ProcessGroupNCCL.cpp:669] [Rank 8] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.052960 19151 ProcessGroupNCCL.cpp:835] [Rank 8] NCCL watchdog thread started!
I1109 17:34:02.052891 19150 ProcessGroupNCCL.cpp:835] [Rank 11] NCCL watchdog thread started!
I1109 17:34:02.053032 16209 ProcessGroupNCCL.cpp:669] [Rank 10] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.053164 19152 ProcessGroupNCCL.cpp:835] [Rank 10] NCCL watchdog thread started!
I1109 17:34:02.060881  8126 ProcessGroupNCCL.cpp:669] [Rank 29] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.060943 10801 ProcessGroupNCCL.cpp:835] [Rank 29] NCCL watchdog thread started!
I1109 17:34:02.061071  8123 ProcessGroupNCCL.cpp:669] [Rank 31] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.061175 10802 ProcessGroupNCCL.cpp:835] [Rank 31] NCCL watchdog thread started!
I1109 17:34:02.061152  8125 ProcessGroupNCCL.cpp:669] [Rank 28] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.061255 10803 ProcessGroupNCCL.cpp:835] [Rank 28] NCCL watchdog thread started!
I1109 17:34:02.061368 10804 ProcessGroupNCCL.cpp:835] [Rank 30] NCCL watchdog thread started!
I1109 17:34:02.061254  8124 ProcessGroupNCCL.cpp:669] [Rank 30] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.055279 20872 ProcessGroupNCCL.cpp:669] [Rank 86] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.055294 20873 ProcessGroupNCCL.cpp:669] [Rank 85] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.055444 20871 ProcessGroupNCCL.cpp:669] [Rank 87] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.055516 23352 ProcessGroupNCCL.cpp:835] [Rank 87] NCCL watchdog thread started!
I1109 17:34:02.055392 23350 ProcessGroupNCCL.cpp:835] [Rank 86] NCCL watchdog thread started!
I1109 17:34:02.055410 23351 ProcessGroupNCCL.cpp:835] [Rank 85] NCCL watchdog thread started!
I1109 17:34:02.055728 20870 ProcessGroupNCCL.cpp:669] [Rank 84] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.055828 23353 ProcessGroupNCCL.cpp:835] [Rank 84] NCCL watchdog thread started!
I1109 17:34:02.086508 20107 ProcessGroupNCCL.cpp:669] [Rank 58] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.086602 23241 ProcessGroupNCCL.cpp:835] [Rank 58] NCCL watchdog thread started!
I1109 17:34:02.086592 20109 ProcessGroupNCCL.cpp:669] [Rank 59] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.086705 23242 ProcessGroupNCCL.cpp:835] [Rank 59] NCCL watchdog thread started!
I1109 17:34:02.086737 20108 ProcessGroupNCCL.cpp:669] [Rank 56] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.086845 23243 ProcessGroupNCCL.cpp:835] [Rank 56] NCCL watchdog thread started!
I1109 17:34:02.086966 20110 ProcessGroupNCCL.cpp:669] [Rank 57] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.087077 23244 ProcessGroupNCCL.cpp:835] [Rank 57] NCCL watchdog thread started!
I1109 17:34:02.089745 32368 ProcessGroupNCCL.cpp:669] [Rank 95] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.089812  2677 ProcessGroupNCCL.cpp:835] [Rank 95] NCCL watchdog thread started!
I1109 17:34:02.088073 32395 ProcessGroupNCCL.cpp:669] [Rank 66] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.088135  3198 ProcessGroupNCCL.cpp:835] [Rank 66] NCCL watchdog thread started!
I1109 17:34:02.088290 32398 ProcessGroupNCCL.cpp:669] [Rank 65] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.088407  3199 ProcessGroupNCCL.cpp:835] [Rank 65] NCCL watchdog thread started!
I1109 17:34:02.088351 32396 ProcessGroupNCCL.cpp:669] [Rank 64] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.088440  3200 ProcessGroupNCCL.cpp:835] [Rank 64] NCCL watchdog thread started!
I1109 17:34:02.088402 32397 ProcessGroupNCCL.cpp:669] [Rank 67] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.088497  3201 ProcessGroupNCCL.cpp:835] [Rank 67] NCCL watchdog thread started!
I1109 17:34:02.090261 32367 ProcessGroupNCCL.cpp:669] [Rank 94] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.090368  2678 ProcessGroupNCCL.cpp:835] [Rank 94] NCCL watchdog thread started!
I1109 17:34:02.090348 32370 ProcessGroupNCCL.cpp:669] [Rank 92] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.090437  2679 ProcessGroupNCCL.cpp:835] [Rank 92] NCCL watchdog thread started!
I1109 17:34:02.090436  2680 ProcessGroupNCCL.cpp:835] [Rank 93] NCCL watchdog thread started!
I1109 17:34:02.090364 32369 ProcessGroupNCCL.cpp:669] [Rank 93] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.105688 11305 ProcessGroupNCCL.cpp:669] [Rank 79] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.105784 14579 ProcessGroupNCCL.cpp:835] [Rank 79] NCCL watchdog thread started!
I1109 17:34:02.106035 11304 ProcessGroupNCCL.cpp:669] [Rank 76] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.106051 11306 ProcessGroupNCCL.cpp:669] [Rank 77] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.106173 14580 ProcessGroupNCCL.cpp:835] [Rank 76] NCCL watchdog thread started!
I1109 17:34:02.106189 14581 ProcessGroupNCCL.cpp:835] [Rank 77] NCCL watchdog thread started!
I1109 17:34:02.106168 11303 ProcessGroupNCCL.cpp:669] [Rank 78] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.106264 14582 ProcessGroupNCCL.cpp:835] [Rank 78] NCCL watchdog thread started!
I1109 17:34:02.147316  5942 ProcessGroupNCCL.cpp:835] [Rank 43] NCCL watchdog thread started!
I1109 17:34:02.147264  2440 ProcessGroupNCCL.cpp:669] [Rank 43] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147325  2438 ProcessGroupNCCL.cpp:669] [Rank 40] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147454  5943 ProcessGroupNCCL.cpp:835] [Rank 40] NCCL watchdog thread started!
I1109 17:34:02.138859 27455 ProcessGroupNCCL.cpp:669] [Rank 69] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.138959 30007 ProcessGroupNCCL.cpp:835] [Rank 69] NCCL watchdog thread started!
I1109 17:34:02.139061 27454 ProcessGroupNCCL.cpp:669] [Rank 70] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.139185 30008 ProcessGroupNCCL.cpp:835] [Rank 70] NCCL watchdog thread started!
I1109 17:34:02.137028 27304 ProcessGroupNCCL.cpp:669] [Rank 4] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137166 29712 ProcessGroupNCCL.cpp:835] [Rank 4] NCCL watchdog thread started!
I1109 17:34:02.147799  2439 ProcessGroupNCCL.cpp:669] [Rank 42] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137094 27305 ProcessGroupNCCL.cpp:669] [Rank 7] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137228 29713 ProcessGroupNCCL.cpp:835] [Rank 7] NCCL watchdog thread started!
I1109 17:34:02.147944  5944 ProcessGroupNCCL.cpp:835] [Rank 42] NCCL watchdog thread started!
I1109 17:34:02.147948  5945 ProcessGroupNCCL.cpp:835] [Rank 41] NCCL watchdog thread started!
I1109 17:34:02.137269 27303 ProcessGroupNCCL.cpp:669] [Rank 5] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147897  2437 ProcessGroupNCCL.cpp:669] [Rank 41] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137362 29715 ProcessGroupNCCL.cpp:835] [Rank 5] NCCL watchdog thread started!
I1109 17:34:02.137820 22313 ProcessGroupNCCL.cpp:835] [Rank 3] NCCL watchdog thread started!
I1109 17:34:02.137753 18666 ProcessGroupNCCL.cpp:669] [Rank 3] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.139317 27453 ProcessGroupNCCL.cpp:669] [Rank 68] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.139441 30009 ProcessGroupNCCL.cpp:835] [Rank 68] NCCL watchdog thread started!
I1109 17:34:02.139469 30010 ProcessGroupNCCL.cpp:835] [Rank 71] NCCL watchdog thread started!
I1109 17:34:02.139461 27452 ProcessGroupNCCL.cpp:669] [Rank 71] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137634 29716 ProcessGroupNCCL.cpp:835] [Rank 6] NCCL watchdog thread started!
I1109 17:34:02.137140 10843 ProcessGroupNCCL.cpp:835] [Rank 89] NCCL watchdog thread started!
I1109 17:34:02.137045  7395 ProcessGroupNCCL.cpp:669] [Rank 89] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137578 27302 ProcessGroupNCCL.cpp:669] [Rank 6] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.138039 18665 ProcessGroupNCCL.cpp:669] [Rank 2] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137290 10844 ProcessGroupNCCL.cpp:835] [Rank 90] NCCL watchdog thread started!
I1109 17:34:02.138054 18663 ProcessGroupNCCL.cpp:669] [Rank 1] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.138171 22314 ProcessGroupNCCL.cpp:835] [Rank 2] NCCL watchdog thread started!
I1109 17:34:02.138165 22315 ProcessGroupNCCL.cpp:835] [Rank 1] NCCL watchdog thread started!
I1109 17:34:02.137368 10845 ProcessGroupNCCL.cpp:835] [Rank 88] NCCL watchdog thread started!
I1109 17:34:02.137208  7397 ProcessGroupNCCL.cpp:669] [Rank 90] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137271  7396 ProcessGroupNCCL.cpp:669] [Rank 88] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137370  7394 ProcessGroupNCCL.cpp:669] [Rank 91] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.137477 10846 ProcessGroupNCCL.cpp:835] [Rank 91] NCCL watchdog thread started!
I1109 17:34:02.138484 18664 ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.138576 22316 ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
I1109 17:34:02.138808 17920 ProcessGroupNCCL.cpp:835] [Rank 73] NCCL watchdog thread started!
I1109 17:34:02.138722 14567 ProcessGroupNCCL.cpp:669] [Rank 73] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.138833 17921 ProcessGroupNCCL.cpp:835] [Rank 74] NCCL watchdog thread started!
I1109 17:34:02.138751 14569 ProcessGroupNCCL.cpp:669] [Rank 74] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.139180 17922 ProcessGroupNCCL.cpp:835] [Rank 75] NCCL watchdog thread started!
I1109 17:34:02.139096 14568 ProcessGroupNCCL.cpp:669] [Rank 72] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.139076 14570 ProcessGroupNCCL.cpp:669] [Rank 75] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.139199 17923 ProcessGroupNCCL.cpp:835] [Rank 72] NCCL watchdog thread started!
I1109 17:34:02.147156 20212 ProcessGroupNCCL.cpp:835] [Rank 51] NCCL watchdog thread started!
I1109 17:34:02.147053 16953 ProcessGroupNCCL.cpp:669] [Rank 51] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147470 20213 ProcessGroupNCCL.cpp:835] [Rank 50] NCCL watchdog thread started!
I1109 17:34:02.147358 16951 ProcessGroupNCCL.cpp:669] [Rank 50] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147431 16954 ProcessGroupNCCL.cpp:669] [Rank 49] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147557 20214 ProcessGroupNCCL.cpp:835] [Rank 49] NCCL watchdog thread started!
I1109 17:34:02.147569 16952 ProcessGroupNCCL.cpp:669] [Rank 48] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.147677 20215 ProcessGroupNCCL.cpp:835] [Rank 48] NCCL watchdog thread started!
I1109 17:34:02.139884 27952 ProcessGroupNCCL.cpp:669] [Rank 37] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.140003 31226 ProcessGroupNCCL.cpp:835] [Rank 37] NCCL watchdog thread started!
I1109 17:34:02.139963 27954 ProcessGroupNCCL.cpp:669] [Rank 38] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.140075 31227 ProcessGroupNCCL.cpp:835] [Rank 38] NCCL watchdog thread started!
I1109 17:34:02.140292 27953 ProcessGroupNCCL.cpp:669] [Rank 36] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.140398 31228 ProcessGroupNCCL.cpp:835] [Rank 36] NCCL watchdog thread started!
I1109 17:34:02.140379 27955 ProcessGroupNCCL.cpp:669] [Rank 39] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.140478 31229 ProcessGroupNCCL.cpp:835] [Rank 39] NCCL watchdog thread started!
I1109 17:34:02.148700  4183 ProcessGroupNCCL.cpp:669] [Rank 63] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.148789  7764 ProcessGroupNCCL.cpp:835] [Rank 63] NCCL watchdog thread started!
I1109 17:34:02.149238  7765 ProcessGroupNCCL.cpp:835] [Rank 60] NCCL watchdog thread started!
I1109 17:34:02.149189  4184 ProcessGroupNCCL.cpp:669] [Rank 60] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.149498  7766 ProcessGroupNCCL.cpp:835] [Rank 62] NCCL watchdog thread started!
I1109 17:34:02.149426  4185 ProcessGroupNCCL.cpp:669] [Rank 62] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.149549  4186 ProcessGroupNCCL.cpp:669] [Rank 61] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.149657  7767 ProcessGroupNCCL.cpp:835] [Rank 61] NCCL watchdog thread started!
I1109 17:34:02.144274 10614 ProcessGroupNCCL.cpp:669] [Rank 81] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.144398 10615 ProcessGroupNCCL.cpp:669] [Rank 82] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.144348 13370 ProcessGroupNCCL.cpp:835] [Rank 81] NCCL watchdog thread started!
I1109 17:34:02.144512 10612 ProcessGroupNCCL.cpp:669] [Rank 83] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.144651 13373 ProcessGroupNCCL.cpp:835] [Rank 80] NCCL watchdog thread started!
I1109 17:34:02.144521 13371 ProcessGroupNCCL.cpp:835] [Rank 82] NCCL watchdog thread started!
I1109 17:34:02.144598 13372 ProcessGroupNCCL.cpp:835] [Rank 83] NCCL watchdog thread started!
I1109 17:34:02.144588 10613 ProcessGroupNCCL.cpp:669] [Rank 80] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.153229  3627 ProcessGroupNCCL.cpp:669] [Rank 23] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.153342  6951 ProcessGroupNCCL.cpp:835] [Rank 23] NCCL watchdog thread started!
I1109 17:34:02.153278  3629 ProcessGroupNCCL.cpp:669] [Rank 21] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.153400  6952 ProcessGroupNCCL.cpp:835] [Rank 21] NCCL watchdog thread started!
I1109 17:34:02.153373  3630 ProcessGroupNCCL.cpp:669] [Rank 22] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.153483  6953 ProcessGroupNCCL.cpp:835] [Rank 22] NCCL watchdog thread started!
I1109 17:34:02.153525  3628 ProcessGroupNCCL.cpp:669] [Rank 20] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.153632  6954 ProcessGroupNCCL.cpp:835] [Rank 20] NCCL watchdog thread started!
I1109 17:34:02.154286  3707 ProcessGroupNCCL.cpp:835] [Rank 46] NCCL watchdog thread started!
I1109 17:34:02.154232   498 ProcessGroupNCCL.cpp:669] [Rank 46] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.154489   499 ProcessGroupNCCL.cpp:669] [Rank 44] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.154582  3708 ProcessGroupNCCL.cpp:835] [Rank 44] NCCL watchdog thread started!
I1109 17:34:02.154618  3709 ProcessGroupNCCL.cpp:835] [Rank 45] NCCL watchdog thread started!
I1109 17:34:02.154567   500 ProcessGroupNCCL.cpp:669] [Rank 45] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.154814  3710 ProcessGroupNCCL.cpp:835] [Rank 47] NCCL watchdog thread started!
I1109 17:34:02.154771   502 ProcessGroupNCCL.cpp:669] [Rank 47] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.160118  7615 ProcessGroupNCCL.cpp:669] [Rank 55] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.160204 10093 ProcessGroupNCCL.cpp:835] [Rank 55] NCCL watchdog thread started!
I1109 17:34:02.160220  7614 ProcessGroupNCCL.cpp:669] [Rank 52] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.160332 10094 ProcessGroupNCCL.cpp:835] [Rank 52] NCCL watchdog thread started!
I1109 17:34:02.160545  7613 ProcessGroupNCCL.cpp:669] [Rank 54] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.160670 10095 ProcessGroupNCCL.cpp:835] [Rank 54] NCCL watchdog thread started!
I1109 17:34:02.160699  7612 ProcessGroupNCCL.cpp:669] [Rank 53] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.160784 10096 ProcessGroupNCCL.cpp:835] [Rank 53] NCCL watchdog thread started!
I1109 17:34:02.189980 27607 ProcessGroupNCCL.cpp:669] [Rank 19] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.190091 30907 ProcessGroupNCCL.cpp:835] [Rank 19] NCCL watchdog thread started!
I1109 17:34:02.190423 27606 ProcessGroupNCCL.cpp:669] [Rank 16] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.190507 30908 ProcessGroupNCCL.cpp:835] [Rank 16] NCCL watchdog thread started!
I1109 17:34:02.190706 27608 ProcessGroupNCCL.cpp:669] [Rank 18] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.190807 30909 ProcessGroupNCCL.cpp:835] [Rank 18] NCCL watchdog thread started!
I1109 17:34:02.190833 27605 ProcessGroupNCCL.cpp:669] [Rank 17] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.190941 30910 ProcessGroupNCCL.cpp:835] [Rank 17] NCCL watchdog thread started!
I1109 17:34:02.226173 10491 ProcessGroupNCCL.cpp:835] [Rank 33] NCCL watchdog thread started!
I1109 17:34:02.226091  7790 ProcessGroupNCCL.cpp:669] [Rank 33] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.226142  7789 ProcessGroupNCCL.cpp:669] [Rank 34] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.226245 10492 ProcessGroupNCCL.cpp:835] [Rank 34] NCCL watchdog thread started!
I1109 17:34:02.226444  7788 ProcessGroupNCCL.cpp:669] [Rank 35] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.226452  7791 ProcessGroupNCCL.cpp:669] [Rank 32] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:34:02.226550 10494 ProcessGroupNCCL.cpp:835] [Rank 32] NCCL watchdog thread started!
I1109 17:34:02.226552 10493 ProcessGroupNCCL.cpp:835] [Rank 35] NCCL watchdog thread started!
I1109 17:34:03.274609 18664 ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: INFO
  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 1/420 [00:25<3:01:03, 25.93s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:41, 26.02s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:34, 26.00s/it]  0%|          | 1/420 [00:26<3:01:37, 26.01s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:38, 26.01s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:36, 26.00s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    0%|          | 1/420 [00:25<3:01:03, 25.93s/it]                                                                                                                                                     0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:37, 26.01s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:41, 26.02s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]                                                                                                    0%|          | 1/420 [00:26<3:01:34, 26.00s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:36, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]                                                   0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:35, 26.00s/it]  0%|          | 1/420 [00:26<3:01:38, 26.01s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 1/420 [00:26<3:01:36, 26.01s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<1:59:51, 17.20s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:06, 17.24s/it]  0%|          | 2/420 [00:37<2:00:04, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:06, 17.24s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:04, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]                                                                                                                                                                                                                                                                                                                                                                                                          0%|          | 2/420 [00:37<1:59:51, 17.20s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0%|          | 2/420 [00:37<2:00:04, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:04, 17.24s/it]  0%|          | 2/420 [00:37<2:00:06, 17.24s/it]                                                                                                                                                                                                      0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:06, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  0%|          | 2/420 [00:37<2:00:05, 17.24s/it]  0%|          | 2/420 [00:37<2:00:03, 17.23s/it]  1%|          | 3/420 [00:48<1:40:10, 14.41s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:18, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.44s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:18, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.44s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:18, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:16, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1%|          | 3/420 [00:48<1:40:10, 14.41s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:16, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]                                                   1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.44s/it]  1%|          | 3/420 [00:48<1:40:19, 14.43s/it]  1%|          | 3/420 [00:48<1:40:18, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.44s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:18, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.43s/it]  1%|          | 3/420 [00:48<1:40:18, 14.43s/it]  1%|          | 3/420 [00:48<1:40:17, 14.43s/it]                                                   1%|          | 3/420 [00:48<1:40:17, 14.43s/it]  1%|          | 3/420 [00:48<1:40:19, 14.43s/it]  1%|          | 4/420 [00:59<1:30:48, 13.10s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:54, 13.11s/it]  1%|          | 4/420 [00:59<1:30:54, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:54, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              1%|          | 4/420 [00:59<1:30:48, 13.10s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:54, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:52, 13.11s/it]  1%|          | 4/420 [00:59<1:30:54, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:54, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:53, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 4/420 [00:59<1:30:55, 13.11s/it]                                                   1%|          | 4/420 [00:59<1:30:55, 13.11s/it]  1%|          | 5/420 [01:10<1:25:34, 12.37s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]                                                                                                                                                                                                                                                       1%|          | 5/420 [01:10<1:25:34, 12.37s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]                                                   1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]                                                                                                                                                     1%|          | 5/420 [01:10<1:25:36, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:37, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|          | 5/420 [01:10<1:25:38, 12.38s/it]  1%|▏         | 6/420 [01:21<1:22:21, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]                                                                                                                                                                                                                                                       1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]                                                                                                    1%|▏         | 6/420 [01:21<1:22:21, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:22, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]                                                   1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:24, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  1%|▏         | 6/420 [01:21<1:22:23, 11.94s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]                                                                                                                                                                                                                                                                                                        2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]                                                   2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:18, 11.67s/it]  2%|▏         | 7/420 [01:32<1:20:17, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 7/420 [01:32<1:20:16, 11.66s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                           2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]                                                                                                                                                     2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:52, 11.49s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:51, 11.48s/it]  2%|▏         | 8/420 [01:43<1:18:50, 11.48s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]                                                                                                                                                     2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:49, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]                                                   2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:50, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 9/420 [01:54<1:17:48, 11.36s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                    2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]                                                    2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:04, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:03, 11.28s/it]  2%|▏         | 10/420 [02:05<1:17:05, 11.28s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]                                                                                                                                                                                                                                                                                                                                                                                                                  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]                                                                                                                                                                                                          3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]                                                    3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:28, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:29, 11.22s/it]  3%|▎         | 11/420 [02:16<1:16:30, 11.22s/it]  3%|▎         | 12/420 [02:27<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:03, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:27<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        3%|▎         | 12/420 [02:27<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:27<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]                                                    3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:03, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:02, 11.18s/it]  3%|▎         | 12/420 [02:28<1:16:01, 11.18s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:40, 11.16s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:40, 11.16s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:40, 11.16s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]                                                                                                                                                                                                                                                                                                                                                                                                                  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:40, 11.16s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]                                                    3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]                                                                                                      3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:38, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:40, 11.16s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 13/420 [02:39<1:15:40, 11.16s/it]  3%|▎         | 13/420 [02:39<1:15:39, 11.15s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:22, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:22, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:22, 11.14s/it]                                                                                                                                                                                                                                                                                                              3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]                                                                                                                                                                                                                                                                                                                                                                                                                  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]                                                                                                                                                                                                          3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]                                                                                                                                                                                                                                                                                                              3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:22, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:22, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:20, 11.13s/it]  3%|▎         | 14/420 [02:50<1:15:22, 11.14s/it]  3%|▎         | 14/420 [02:50<1:15:21, 11.14s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]                                                    4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:21, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▎         | 15/420 [03:01<1:15:20, 11.16s/it]                                                    4%|▎         | 15/420 [03:01<1:15:19, 11.16s/it]  4%|▍         | 16/420 [03:12<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:12<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:12<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  4%|▍         | 16/420 [03:12<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]                                                                                                                                                                                                                                                                                                                                                                                                                  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:02, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 16/420 [03:13<1:16:01, 11.29s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:26, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:27, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:26, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:26, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]                                                                                                                                                                                                                                                            4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]                                                                                                                                                        4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:26, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:27, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:26, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:26, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 17/420 [03:24<1:15:25, 11.23s/it]  4%|▍         | 18/420 [03:35<1:15:04, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:06, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:06, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:04, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]                                                    4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:06, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:04, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:06, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]                                                                                                                                                        4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:04, 11.21s/it]  4%|▍         | 18/420 [03:35<1:15:05, 11.21s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]                                                    5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.35s/it]  5%|▍         | 19/420 [03:46<1:15:48, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 19/420 [03:46<1:15:49, 11.34s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:17, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]                                                    5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]                                                                                                                                                                                                                                                                                                              5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:17, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:15, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▍         | 20/420 [03:58<1:16:16, 11.44s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:23, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        5%|▌         | 21/420 [04:10<1:16:23, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]                                                    5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]                                                                                                      5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.48s/it]                                                                                                      5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]  5%|▌         | 21/420 [04:10<1:16:22, 11.49s/it]  5%|▌         | 21/420 [04:10<1:16:21, 11.48s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                  ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �            │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                  ��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
            │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:30, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                  ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �            │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                  ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �            │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:32, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:32, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:32, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:32, 12.56s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  5%|▌         | 21/420 [04:23<1:23:31, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:32, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:32, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:33, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:33, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:33, 12.56s/it]  5%|▌         | 21/420 [04:23<1:23:33, 12.56s/it]--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[52229,1],33]
  Exit code:    1
--------------------------------------------------------------------------