WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.634884   316 ProcessGroupNCCL.cpp:835] [Rank 51] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.636107   314 ProcessGroupNCCL.cpp:835] [Rank 50] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.637125   313 ProcessGroupNCCL.cpp:835] [Rank 49] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.637635   315 ProcessGroupNCCL.cpp:835] [Rank 48] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.701238 28451 ProcessGroupNCCL.cpp:835] [Rank 43] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.702311 28454 ProcessGroupNCCL.cpp:835] [Rank 42] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.702343 28453 ProcessGroupNCCL.cpp:835] [Rank 40] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.702579 28452 ProcessGroupNCCL.cpp:835] [Rank 41] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696049 29925 ProcessGroupNCCL.cpp:835] [Rank 29] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696075 29926 ProcessGroupNCCL.cpp:835] [Rank 31] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696539 29927 ProcessGroupNCCL.cpp:835] [Rank 28] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696652 29928 ProcessGroupNCCL.cpp:835] [Rank 30] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.694564 21263 ProcessGroupNCCL.cpp:835] [Rank 90] NCCL watchdog thread started!
I1109 17:19:25.698573 28932 ProcessGroupNCCL.cpp:669] [Rank 29] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.698594 28934 ProcessGroupNCCL.cpp:669] [Rank 31] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.698602 28930 ProcessGroupNCCL.cpp:669] [Rank 28] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.698621 28933 ProcessGroupNCCL.cpp:669] [Rank 30] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696197  1770 ProcessGroupNCCL.cpp:835] [Rank 4] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696228  1769 ProcessGroupNCCL.cpp:835] [Rank 5] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696321  1771 ProcessGroupNCCL.cpp:835] [Rank 6] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.696352  1772 ProcessGroupNCCL.cpp:835] [Rank 7] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.695686 21264 ProcessGroupNCCL.cpp:835] [Rank 88] NCCL watchdog thread started!
I1109 17:19:25.698832  1095 ProcessGroupNCCL.cpp:669] [Rank 4] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.698846  1097 ProcessGroupNCCL.cpp:669] [Rank 5] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.698870  1100 ProcessGroupNCCL.cpp:669] [Rank 6] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.698879  1101 ProcessGroupNCCL.cpp:669] [Rank 7] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.698160 21262 ProcessGroupNCCL.cpp:835] [Rank 89] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.698261 21261 ProcessGroupNCCL.cpp:835] [Rank 91] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.712224 18831 ProcessGroupNCCL.cpp:835] [Rank 21] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.697925 13107 ProcessGroupNCCL.cpp:835] [Rank 54] NCCL watchdog thread started!
I1109 17:19:25.712168 17854 ProcessGroupNCCL.cpp:669] [Rank 21] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.712241 17852 ProcessGroupNCCL.cpp:669] [Rank 20] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.697975 13106 ProcessGroupNCCL.cpp:835] [Rank 55] NCCL watchdog thread started!
I1109 17:19:25.712395 18832 ProcessGroupNCCL.cpp:835] [Rank 20] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.697989 13105 ProcessGroupNCCL.cpp:835] [Rank 52] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.712639 18833 ProcessGroupNCCL.cpp:835] [Rank 23] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.711086 19240 ProcessGroupNCCL.cpp:835] [Rank 59] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.698117 13104 ProcessGroupNCCL.cpp:835] [Rank 53] NCCL watchdog thread started!
I1109 17:19:25.712625 17856 ProcessGroupNCCL.cpp:669] [Rank 23] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.711124 19241 ProcessGroupNCCL.cpp:835] [Rank 56] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.705397 29445 ProcessGroupNCCL.cpp:835] [Rank 82] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.712685 18834 ProcessGroupNCCL.cpp:835] [Rank 22] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.711171 19242 ProcessGroupNCCL.cpp:835] [Rank 57] NCCL watchdog thread started!
I1109 17:19:25.712682 17855 ProcessGroupNCCL.cpp:669] [Rank 22] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.711387 19243 ProcessGroupNCCL.cpp:835] [Rank 58] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.705878 29444 ProcessGroupNCCL.cpp:835] [Rank 80] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.700208 19601 ProcessGroupNCCL.cpp:835] [Rank 45] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.702873 23814 ProcessGroupNCCL.cpp:835] [Rank 75] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.703265 23813 ProcessGroupNCCL.cpp:835] [Rank 72] NCCL watchdog thread started!
I1109 17:19:25.713023 18592 ProcessGroupNCCL.cpp:669] [Rank 56] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.713052 18595 ProcessGroupNCCL.cpp:669] [Rank 58] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.713052 18594 ProcessGroupNCCL.cpp:669] [Rank 57] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.713088 18596 ProcessGroupNCCL.cpp:669] [Rank 59] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.701588 19602 ProcessGroupNCCL.cpp:835] [Rank 44] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.701682 19600 ProcessGroupNCCL.cpp:835] [Rank 47] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.701895 19599 ProcessGroupNCCL.cpp:835] [Rank 46] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.704396 23815 ProcessGroupNCCL.cpp:835] [Rank 74] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.704600 23812 ProcessGroupNCCL.cpp:835] [Rank 73] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.716840  1493 ProcessGroupNCCL.cpp:835] [Rank 19] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.709676 29443 ProcessGroupNCCL.cpp:835] [Rank 83] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.717368  1492 ProcessGroupNCCL.cpp:835] [Rank 17] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.717396  1490 ProcessGroupNCCL.cpp:835] [Rank 16] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.717446  1491 ProcessGroupNCCL.cpp:835] [Rank 18] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.714345 21425 ProcessGroupNCCL.cpp:835] [Rank 9] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.714781 21424 ProcessGroupNCCL.cpp:835] [Rank 10] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.690129 14391 ProcessGroupNCCL.cpp:835] [Rank 2] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.691341 14392 ProcessGroupNCCL.cpp:835] [Rank 1] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.697330 14389 ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.697657 14390 ProcessGroupNCCL.cpp:835] [Rank 3] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.719835 24903 ProcessGroupNCCL.cpp:835] [Rank 68] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.713358 31151 ProcessGroupNCCL.cpp:835] [Rank 87] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.720335 24901 ProcessGroupNCCL.cpp:835] [Rank 71] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.720355 24902 ProcessGroupNCCL.cpp:835] [Rank 70] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.714143 31152 ProcessGroupNCCL.cpp:835] [Rank 84] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.714625 31150 ProcessGroupNCCL.cpp:835] [Rank 85] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.715049 31149 ProcessGroupNCCL.cpp:835] [Rank 86] NCCL watchdog thread started!
I1109 17:19:25.721676 23916 ProcessGroupNCCL.cpp:669] [Rank 68] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.721753 23920 ProcessGroupNCCL.cpp:669] [Rank 71] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.721765 23919 ProcessGroupNCCL.cpp:669] [Rank 70] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.723300 24900 ProcessGroupNCCL.cpp:835] [Rank 69] NCCL watchdog thread started!
I1109 17:19:25.723286 23918 ProcessGroupNCCL.cpp:669] [Rank 69] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.712914  2138 ProcessGroupNCCL.cpp:835] [Rank 76] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.722712 21423 ProcessGroupNCCL.cpp:835] [Rank 8] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.714594  2139 ProcessGroupNCCL.cpp:835] [Rank 77] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.728610 21422 ProcessGroupNCCL.cpp:835] [Rank 11] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.728050 29442 ProcessGroupNCCL.cpp:835] [Rank 81] NCCL watchdog thread started!
I1109 17:19:25.737735   532 ProcessGroupNCCL.cpp:669] [Rank 18] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.737759   529 ProcessGroupNCCL.cpp:669] [Rank 16] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.737797   531 ProcessGroupNCCL.cpp:669] [Rank 17] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.737850   533 ProcessGroupNCCL.cpp:669] [Rank 19] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.741814  2375 ProcessGroupNCCL.cpp:835] [Rank 63] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.729055 29786 ProcessGroupNCCL.cpp:835] [Rank 32] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.729046 29788 ProcessGroupNCCL.cpp:835] [Rank 34] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.730794 29787 ProcessGroupNCCL.cpp:835] [Rank 35] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.743968  2374 ProcessGroupNCCL.cpp:835] [Rank 61] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.745115  2376 ProcessGroupNCCL.cpp:835] [Rank 60] NCCL watchdog thread started!
I1109 17:19:25.743041 27563 ProcessGroupNCCL.cpp:669] [Rank 42] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.743072 27564 ProcessGroupNCCL.cpp:669] [Rank 43] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.743113 27561 ProcessGroupNCCL.cpp:669] [Rank 40] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.743125 27562 ProcessGroupNCCL.cpp:669] [Rank 41] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.732630  2141 ProcessGroupNCCL.cpp:835] [Rank 79] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.732650  2140 ProcessGroupNCCL.cpp:835] [Rank 78] NCCL watchdog thread started!
I1109 17:19:25.741647 30090 ProcessGroupNCCL.cpp:669] [Rank 84] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.741659 30092 ProcessGroupNCCL.cpp:669] [Rank 85] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.741683 30093 ProcessGroupNCCL.cpp:669] [Rank 86] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.741693 30094 ProcessGroupNCCL.cpp:669] [Rank 87] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.750344  2373 ProcessGroupNCCL.cpp:835] [Rank 62] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.752183  2651 ProcessGroupNCCL.cpp:835] [Rank 94] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.752539  2650 ProcessGroupNCCL.cpp:835] [Rank 95] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.752895  2648 ProcessGroupNCCL.cpp:835] [Rank 92] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.752992  2649 ProcessGroupNCCL.cpp:835] [Rank 93] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.744205 30975 ProcessGroupNCCL.cpp:835] [Rank 25] NCCL watchdog thread started!
I1109 17:19:25.744140 30003 ProcessGroupNCCL.cpp:669] [Rank 25] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.744231 30977 ProcessGroupNCCL.cpp:835] [Rank 24] NCCL watchdog thread started!
I1109 17:19:25.744155 30001 ProcessGroupNCCL.cpp:669] [Rank 24] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.744314 30976 ProcessGroupNCCL.cpp:835] [Rank 26] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.744346 30978 ProcessGroupNCCL.cpp:835] [Rank 27] NCCL watchdog thread started!
I1109 17:19:25.744437 30005 ProcessGroupNCCL.cpp:669] [Rank 27] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.744467 30004 ProcessGroupNCCL.cpp:669] [Rank 26] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.757555 30796 ProcessGroupNCCL.cpp:835] [Rank 14] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.758790 30795 ProcessGroupNCCL.cpp:835] [Rank 13] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.759550 30794 ProcessGroupNCCL.cpp:835] [Rank 12] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.748878 19131 ProcessGroupNCCL.cpp:835] [Rank 65] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.749011 19132 ProcessGroupNCCL.cpp:835] [Rank 66] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.760469 30797 ProcessGroupNCCL.cpp:835] [Rank 15] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.751334 29785 ProcessGroupNCCL.cpp:835] [Rank 33] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.750589 19129 ProcessGroupNCCL.cpp:835] [Rank 67] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.753679 19130 ProcessGroupNCCL.cpp:835] [Rank 64] NCCL watchdog thread started!
I1109 17:19:25.753633 20316 ProcessGroupNCCL.cpp:669] [Rank 88] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.753656 20319 ProcessGroupNCCL.cpp:669] [Rank 90] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.753669 20318 ProcessGroupNCCL.cpp:669] [Rank 89] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.753690 20320 ProcessGroupNCCL.cpp:669] [Rank 91] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.770100 20544 ProcessGroupNCCL.cpp:669] [Rank 9] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.770109 20545 ProcessGroupNCCL.cpp:669] [Rank 10] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.770121 20546 ProcessGroupNCCL.cpp:669] [Rank 11] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.770131 20542 ProcessGroupNCCL.cpp:669] [Rank 8] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.771445 18265 ProcessGroupNCCL.cpp:669] [Rank 65] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.771474 18266 ProcessGroupNCCL.cpp:669] [Rank 66] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.771490 18267 ProcessGroupNCCL.cpp:669] [Rank 67] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.771502 18263 ProcessGroupNCCL.cpp:669] [Rank 64] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.799934  7626 ProcessGroupNCCL.cpp:835] [Rank 38] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.802338  7627 ProcessGroupNCCL.cpp:835] [Rank 36] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.803121  7625 ProcessGroupNCCL.cpp:835] [Rank 39] NCCL watchdog thread started!
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1109 17:19:25.806672  7624 ProcessGroupNCCL.cpp:835] [Rank 37] NCCL watchdog thread started!
I1109 17:19:25.801447 18606 ProcessGroupNCCL.cpp:669] [Rank 44] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.801455 18608 ProcessGroupNCCL.cpp:669] [Rank 45] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.801463 18609 ProcessGroupNCCL.cpp:669] [Rank 46] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.801476 18610 ProcessGroupNCCL.cpp:669] [Rank 47] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813210  1700 ProcessGroupNCCL.cpp:669] [Rank 92] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813227  1703 ProcessGroupNCCL.cpp:669] [Rank 95] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813705 29746 ProcessGroupNCCL.cpp:669] [Rank 15] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813247  1701 ProcessGroupNCCL.cpp:669] [Rank 93] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813721 29742 ProcessGroupNCCL.cpp:669] [Rank 12] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813274  1702 ProcessGroupNCCL.cpp:669] [Rank 94] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813735 29744 ProcessGroupNCCL.cpp:669] [Rank 13] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802896  1469 ProcessGroupNCCL.cpp:669] [Rank 79] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.813747 29745 ProcessGroupNCCL.cpp:669] [Rank 14] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802894  1468 ProcessGroupNCCL.cpp:669] [Rank 78] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802884  1465 ProcessGroupNCCL.cpp:669] [Rank 76] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.812557 32031 ProcessGroupNCCL.cpp:669] [Rank 49] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802918  1467 ProcessGroupNCCL.cpp:669] [Rank 77] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.812590 32030 ProcessGroupNCCL.cpp:669] [Rank 48] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.812585 32033 ProcessGroupNCCL.cpp:669] [Rank 51] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.812610 32032 ProcessGroupNCCL.cpp:669] [Rank 50] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802479 11956 ProcessGroupNCCL.cpp:669] [Rank 53] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802495 11954 ProcessGroupNCCL.cpp:669] [Rank 52] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802512 11957 ProcessGroupNCCL.cpp:669] [Rank 54] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.802527 11958 ProcessGroupNCCL.cpp:669] [Rank 55] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.804852 13224 ProcessGroupNCCL.cpp:669] [Rank 3] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.804855 13221 ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.804890 13222 ProcessGroupNCCL.cpp:669] [Rank 1] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.804898 13223 ProcessGroupNCCL.cpp:669] [Rank 2] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.819341  1636 ProcessGroupNCCL.cpp:669] [Rank 62] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.819362  1637 ProcessGroupNCCL.cpp:669] [Rank 63] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.819371  1633 ProcessGroupNCCL.cpp:669] [Rank 60] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.819381  1635 ProcessGroupNCCL.cpp:669] [Rank 61] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.838920  6738 ProcessGroupNCCL.cpp:669] [Rank 37] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.838945  6736 ProcessGroupNCCL.cpp:669] [Rank 36] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.838969  6739 ProcessGroupNCCL.cpp:669] [Rank 38] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.838966  6740 ProcessGroupNCCL.cpp:669] [Rank 39] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871241 22930 ProcessGroupNCCL.cpp:669] [Rank 72] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871281 22932 ProcessGroupNCCL.cpp:669] [Rank 73] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871290 22934 ProcessGroupNCCL.cpp:669] [Rank 75] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871305 22933 ProcessGroupNCCL.cpp:669] [Rank 74] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871773 29128 ProcessGroupNCCL.cpp:669] [Rank 34] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871788 29129 ProcessGroupNCCL.cpp:669] [Rank 35] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871793 29127 ProcessGroupNCCL.cpp:669] [Rank 33] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.871801 29125 ProcessGroupNCCL.cpp:669] [Rank 32] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.877801 28710 ProcessGroupNCCL.cpp:669] [Rank 83] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.877813 28706 ProcessGroupNCCL.cpp:669] [Rank 80] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.877831 28708 ProcessGroupNCCL.cpp:669] [Rank 81] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:19:25.877843 28709 ProcessGroupNCCL.cpp:669] [Rank 82] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
I1109 17:19:30.471735 13221 ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: INFO
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.20s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.10s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.90s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.02s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.13s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.12s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.90s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.03s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.05s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.86s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.99s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.08s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.89s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.07s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.09s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.95s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.93s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.99s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.90s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.03s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.86s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.08s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.89s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.07s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.11s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.95s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.93s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.90s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.03s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.05s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.86s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.07s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.89s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.90s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.11s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.95s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.93s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.98s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.87s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.88s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.91s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.97s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.96s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.86s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.08s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.89s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.06s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.11s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.94s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:29<00:29, 29.92s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.00s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.04s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.24s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.27s/it]Loading checkpoint shards:  50%|█████     | 1/2 [00:30<00:30, 30.17s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.39s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.39s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.57s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.57s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.35s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.37s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.39s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.37s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.35s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.35s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.29s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.40s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.35s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.38s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.29s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.53s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.55s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.53s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.56s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.57s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.36s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.35s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.39s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.30s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.29s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.53s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.55s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.53s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.55s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.50s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.52s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.39s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.31s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.29s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.34s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.32s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.53s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.55s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.56s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.50s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.52s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.56s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 22.33s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.53s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.55s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.55s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.50s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.52s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.56s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.46s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.50s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.52s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.56s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.46s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.56s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.47s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.45s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.51s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.48s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:46<00:00, 23.49s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.41s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.58s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.41s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.59s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 22.37s/it]Loading checkpoint shards: 100%|██████████| 2/2 [00:47<00:00, 23.54s/it]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (159 > 64). Running this sequence through the model will result in indexing errors
I1109 17:21:30.232172 28708 ProcessGroupNCCL.cpp:669] [Rank 81] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.232273 31046 ProcessGroupNCCL.cpp:835] [Rank 81] NCCL watchdog thread started!
I1109 17:21:30.232368 28709 ProcessGroupNCCL.cpp:669] [Rank 82] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.232481 31047 ProcessGroupNCCL.cpp:835] [Rank 82] NCCL watchdog thread started!
I1109 17:21:30.229944  1885 ProcessGroupNCCL.cpp:835] [Rank 49] NCCL watchdog thread started!
I1109 17:21:30.229861 32031 ProcessGroupNCCL.cpp:669] [Rank 49] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.229918 32032 ProcessGroupNCCL.cpp:669] [Rank 50] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.230038  1886 ProcessGroupNCCL.cpp:835] [Rank 50] NCCL watchdog thread started!
I1109 17:21:30.232705 28710 ProcessGroupNCCL.cpp:669] [Rank 83] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.232802 31048 ProcessGroupNCCL.cpp:835] [Rank 83] NCCL watchdog thread started!
I1109 17:21:30.230407 32033 ProcessGroupNCCL.cpp:669] [Rank 51] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.230502  1887 ProcessGroupNCCL.cpp:835] [Rank 51] NCCL watchdog thread started!
I1109 17:21:30.230775 32030 ProcessGroupNCCL.cpp:669] [Rank 48] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.230895  1888 ProcessGroupNCCL.cpp:835] [Rank 48] NCCL watchdog thread started!
I1109 17:21:30.233544 28706 ProcessGroupNCCL.cpp:669] [Rank 80] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.233637 31049 ProcessGroupNCCL.cpp:835] [Rank 80] NCCL watchdog thread started!
I1109 17:21:30.238595 23159 ProcessGroupNCCL.cpp:835] [Rank 90] NCCL watchdog thread started!
I1109 17:21:30.238538 20319 ProcessGroupNCCL.cpp:669] [Rank 90] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.238708 20320 ProcessGroupNCCL.cpp:669] [Rank 91] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.238803 23160 ProcessGroupNCCL.cpp:835] [Rank 91] NCCL watchdog thread started!
I1109 17:21:30.239065 20318 ProcessGroupNCCL.cpp:669] [Rank 89] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.239179 23161 ProcessGroupNCCL.cpp:835] [Rank 89] NCCL watchdog thread started!
I1109 17:21:30.239212 23162 ProcessGroupNCCL.cpp:835] [Rank 88] NCCL watchdog thread started!
I1109 17:21:30.239173 20316 ProcessGroupNCCL.cpp:669] [Rank 88] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.301719 28930 ProcessGroupNCCL.cpp:669] [Rank 28] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.301832 31951 ProcessGroupNCCL.cpp:835] [Rank 28] NCCL watchdog thread started!
I1109 17:21:30.301964 28932 ProcessGroupNCCL.cpp:669] [Rank 29] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.301995 28933 ProcessGroupNCCL.cpp:669] [Rank 30] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.302104 31952 ProcessGroupNCCL.cpp:835] [Rank 29] NCCL watchdog thread started!
I1109 17:21:30.302109 31953 ProcessGroupNCCL.cpp:835] [Rank 30] NCCL watchdog thread started!
I1109 17:21:30.302201 28934 ProcessGroupNCCL.cpp:669] [Rank 31] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.302284 31954 ProcessGroupNCCL.cpp:835] [Rank 31] NCCL watchdog thread started!
I1109 17:21:30.331609 16338 ProcessGroupNCCL.cpp:835] [Rank 3] NCCL watchdog thread started!
I1109 17:21:30.331533 13224 ProcessGroupNCCL.cpp:669] [Rank 3] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.331691 13223 ProcessGroupNCCL.cpp:669] [Rank 2] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.331609 13221 ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.331688 16339 ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
I1109 17:21:30.331826 16340 ProcessGroupNCCL.cpp:835] [Rank 2] NCCL watchdog thread started!
I1109 17:21:30.331841 16341 ProcessGroupNCCL.cpp:835] [Rank 1] NCCL watchdog thread started!
I1109 17:21:30.331806 13222 ProcessGroupNCCL.cpp:669] [Rank 1] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.333994  1637 ProcessGroupNCCL.cpp:669] [Rank 63] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.334096  3886 ProcessGroupNCCL.cpp:835] [Rank 63] NCCL watchdog thread started!
I1109 17:21:30.334051  1633 ProcessGroupNCCL.cpp:669] [Rank 60] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.334182  3888 ProcessGroupNCCL.cpp:835] [Rank 61] NCCL watchdog thread started!
I1109 17:21:30.334098  1635 ProcessGroupNCCL.cpp:669] [Rank 61] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.334167  3887 ProcessGroupNCCL.cpp:835] [Rank 60] NCCL watchdog thread started!
I1109 17:21:30.334408  3889 ProcessGroupNCCL.cpp:835] [Rank 62] NCCL watchdog thread started!
I1109 17:21:30.334314  1636 ProcessGroupNCCL.cpp:669] [Rank 62] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.343241 18263 ProcessGroupNCCL.cpp:669] [Rank 64] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.343325 21076 ProcessGroupNCCL.cpp:835] [Rank 64] NCCL watchdog thread started!
I1109 17:21:30.343264 18267 ProcessGroupNCCL.cpp:669] [Rank 67] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.343355 21077 ProcessGroupNCCL.cpp:835] [Rank 67] NCCL watchdog thread started!
I1109 17:21:30.343654 18266 ProcessGroupNCCL.cpp:669] [Rank 66] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.343760 21078 ProcessGroupNCCL.cpp:835] [Rank 66] NCCL watchdog thread started!
I1109 17:21:30.343690 18265 ProcessGroupNCCL.cpp:669] [Rank 65] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.343786 21079 ProcessGroupNCCL.cpp:835] [Rank 65] NCCL watchdog thread started!
I1109 17:21:30.358815  3710 ProcessGroupNCCL.cpp:835] [Rank 77] NCCL watchdog thread started!
I1109 17:21:30.358717  1467 ProcessGroupNCCL.cpp:669] [Rank 77] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.358798  1468 ProcessGroupNCCL.cpp:669] [Rank 78] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.358902  3711 ProcessGroupNCCL.cpp:835] [Rank 78] NCCL watchdog thread started!
I1109 17:21:30.359069  1469 ProcessGroupNCCL.cpp:669] [Rank 79] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.359118  3712 ProcessGroupNCCL.cpp:835] [Rank 79] NCCL watchdog thread started!
I1109 17:21:30.359551  1465 ProcessGroupNCCL.cpp:669] [Rank 76] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.359650  3713 ProcessGroupNCCL.cpp:835] [Rank 76] NCCL watchdog thread started!
I1109 17:21:30.366884 31377 ProcessGroupNCCL.cpp:835] [Rank 35] NCCL watchdog thread started!
I1109 17:21:30.366897 31378 ProcessGroupNCCL.cpp:835] [Rank 32] NCCL watchdog thread started!
I1109 17:21:30.366815 29129 ProcessGroupNCCL.cpp:669] [Rank 35] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.367022 29127 ProcessGroupNCCL.cpp:669] [Rank 33] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.367130 31379 ProcessGroupNCCL.cpp:835] [Rank 33] NCCL watchdog thread started!
I1109 17:21:30.366852 29125 ProcessGroupNCCL.cpp:669] [Rank 32] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.367161 29128 ProcessGroupNCCL.cpp:669] [Rank 34] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.367300 31380 ProcessGroupNCCL.cpp:835] [Rank 34] NCCL watchdog thread started!
I1109 17:21:30.369704 10007 ProcessGroupNCCL.cpp:835] [Rank 38] NCCL watchdog thread started!
I1109 17:21:30.369601  6739 ProcessGroupNCCL.cpp:669] [Rank 38] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.369699  6738 ProcessGroupNCCL.cpp:669] [Rank 37] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.369814 10008 ProcessGroupNCCL.cpp:835] [Rank 37] NCCL watchdog thread started!
I1109 17:21:30.369872  6736 ProcessGroupNCCL.cpp:669] [Rank 36] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.369959 10009 ProcessGroupNCCL.cpp:835] [Rank 36] NCCL watchdog thread started!
I1109 17:21:30.370020 10010 ProcessGroupNCCL.cpp:835] [Rank 39] NCCL watchdog thread started!
I1109 17:21:30.369932  6740 ProcessGroupNCCL.cpp:669] [Rank 39] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.417722 22934 ProcessGroupNCCL.cpp:669] [Rank 75] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.417771 25768 ProcessGroupNCCL.cpp:835] [Rank 75] NCCL watchdog thread started!
I1109 17:21:30.417698 22932 ProcessGroupNCCL.cpp:669] [Rank 73] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.417791 25767 ProcessGroupNCCL.cpp:835] [Rank 73] NCCL watchdog thread started!
I1109 17:21:30.417865 25769 ProcessGroupNCCL.cpp:835] [Rank 72] NCCL watchdog thread started!
I1109 17:21:30.417831 22930 ProcessGroupNCCL.cpp:669] [Rank 72] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.417909 22933 ProcessGroupNCCL.cpp:669] [Rank 74] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.418004 25770 ProcessGroupNCCL.cpp:835] [Rank 74] NCCL watchdog thread started!
I1109 17:21:30.493109 18596 ProcessGroupNCCL.cpp:669] [Rank 59] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.493232 20861 ProcessGroupNCCL.cpp:835] [Rank 59] NCCL watchdog thread started!
I1109 17:21:30.493769 18594 ProcessGroupNCCL.cpp:669] [Rank 57] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.493888 20862 ProcessGroupNCCL.cpp:835] [Rank 57] NCCL watchdog thread started!
I1109 17:21:30.493903 18592 ProcessGroupNCCL.cpp:669] [Rank 56] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.494010 20863 ProcessGroupNCCL.cpp:835] [Rank 56] NCCL watchdog thread started!
I1109 17:21:30.494335 20864 ProcessGroupNCCL.cpp:835] [Rank 58] NCCL watchdog thread started!
I1109 17:21:30.494271 18595 ProcessGroupNCCL.cpp:669] [Rank 58] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.550258 30092 ProcessGroupNCCL.cpp:669] [Rank 85] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.550354   549 ProcessGroupNCCL.cpp:835] [Rank 85] NCCL watchdog thread started!
I1109 17:21:30.550293 30093 ProcessGroupNCCL.cpp:669] [Rank 86] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.550352   550 ProcessGroupNCCL.cpp:835] [Rank 86] NCCL watchdog thread started!
I1109 17:21:30.550855   551 ProcessGroupNCCL.cpp:835] [Rank 84] NCCL watchdog thread started!
I1109 17:21:30.550786 30090 ProcessGroupNCCL.cpp:669] [Rank 84] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.550830 30094 ProcessGroupNCCL.cpp:669] [Rank 87] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.550930   552 ProcessGroupNCCL.cpp:835] [Rank 87] NCCL watchdog thread started!
I1109 17:21:30.642760  4674 ProcessGroupNCCL.cpp:835] [Rank 93] NCCL watchdog thread started!
I1109 17:21:30.642658  1701 ProcessGroupNCCL.cpp:669] [Rank 93] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.642827  4675 ProcessGroupNCCL.cpp:835] [Rank 95] NCCL watchdog thread started!
I1109 17:21:30.642740  1703 ProcessGroupNCCL.cpp:669] [Rank 95] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.641224 23920 ProcessGroupNCCL.cpp:669] [Rank 71] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.641336 26730 ProcessGroupNCCL.cpp:835] [Rank 71] NCCL watchdog thread started!
I1109 17:21:30.643311  4676 ProcessGroupNCCL.cpp:835] [Rank 92] NCCL watchdog thread started!
I1109 17:21:30.643239  1700 ProcessGroupNCCL.cpp:669] [Rank 92] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.639176 11956 ProcessGroupNCCL.cpp:669] [Rank 53] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.639314 15082 ProcessGroupNCCL.cpp:835] [Rank 53] NCCL watchdog thread started!
I1109 17:21:30.641680 23918 ProcessGroupNCCL.cpp:669] [Rank 69] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.641765 26731 ProcessGroupNCCL.cpp:835] [Rank 69] NCCL watchdog thread started!
I1109 17:21:30.643702 20545 ProcessGroupNCCL.cpp:669] [Rank 10] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.641762 23916 ProcessGroupNCCL.cpp:669] [Rank 68] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.643746 20544 ProcessGroupNCCL.cpp:669] [Rank 9] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.643857 23401 ProcessGroupNCCL.cpp:835] [Rank 9] NCCL watchdog thread started!
I1109 17:21:30.643594  1702 ProcessGroupNCCL.cpp:669] [Rank 94] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.643815 23400 ProcessGroupNCCL.cpp:835] [Rank 10] NCCL watchdog thread started!
I1109 17:21:30.643663  4677 ProcessGroupNCCL.cpp:835] [Rank 94] NCCL watchdog thread started!
I1109 17:21:30.641896 26732 ProcessGroupNCCL.cpp:835] [Rank 68] NCCL watchdog thread started!
I1109 17:21:30.639495 11958 ProcessGroupNCCL.cpp:669] [Rank 55] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.639600 15083 ProcessGroupNCCL.cpp:835] [Rank 55] NCCL watchdog thread started!
I1109 17:21:30.643996 20546 ProcessGroupNCCL.cpp:669] [Rank 11] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.644090 23402 ProcessGroupNCCL.cpp:835] [Rank 11] NCCL watchdog thread started!
I1109 17:21:30.642045 26733 ProcessGroupNCCL.cpp:835] [Rank 70] NCCL watchdog thread started!
I1109 17:21:30.641996 23919 ProcessGroupNCCL.cpp:669] [Rank 70] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.644148 23403 ProcessGroupNCCL.cpp:835] [Rank 8] NCCL watchdog thread started!
I1109 17:21:30.644055 20542 ProcessGroupNCCL.cpp:669] [Rank 8] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.639706 11954 ProcessGroupNCCL.cpp:669] [Rank 52] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.639847 15084 ProcessGroupNCCL.cpp:835] [Rank 52] NCCL watchdog thread started!
I1109 17:21:30.639868 11957 ProcessGroupNCCL.cpp:669] [Rank 54] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.639973 15085 ProcessGroupNCCL.cpp:835] [Rank 54] NCCL watchdog thread started!
I1109 17:21:30.651217 30003 ProcessGroupNCCL.cpp:669] [Rank 25] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.651326   476 ProcessGroupNCCL.cpp:835] [Rank 25] NCCL watchdog thread started!
I1109 17:21:30.651661 30004 ProcessGroupNCCL.cpp:669] [Rank 26] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.651758   477 ProcessGroupNCCL.cpp:835] [Rank 26] NCCL watchdog thread started!
I1109 17:21:30.651813 30001 ProcessGroupNCCL.cpp:669] [Rank 24] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.651917   478 ProcessGroupNCCL.cpp:835] [Rank 24] NCCL watchdog thread started!
I1109 17:21:30.652141 30005 ProcessGroupNCCL.cpp:669] [Rank 27] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.652227   479 ProcessGroupNCCL.cpp:835] [Rank 27] NCCL watchdog thread started!
I1109 17:21:30.644507 29745 ProcessGroupNCCL.cpp:669] [Rank 14] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.644623 32747 ProcessGroupNCCL.cpp:835] [Rank 13] NCCL watchdog thread started!
I1109 17:21:30.644559 29744 ProcessGroupNCCL.cpp:669] [Rank 13] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.644613 32746 ProcessGroupNCCL.cpp:835] [Rank 14] NCCL watchdog thread started!
I1109 17:21:30.644968 29742 ProcessGroupNCCL.cpp:669] [Rank 12] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.645058 32748 ProcessGroupNCCL.cpp:835] [Rank 12] NCCL watchdog thread started!
I1109 17:21:30.645152 29746 ProcessGroupNCCL.cpp:669] [Rank 15] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.645275 32749 ProcessGroupNCCL.cpp:835] [Rank 15] NCCL watchdog thread started!
I1109 17:21:30.656955 18608 ProcessGroupNCCL.cpp:669] [Rank 45] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.657056 21575 ProcessGroupNCCL.cpp:835] [Rank 45] NCCL watchdog thread started!
I1109 17:21:30.657174 18606 ProcessGroupNCCL.cpp:669] [Rank 44] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.657289 21576 ProcessGroupNCCL.cpp:835] [Rank 44] NCCL watchdog thread started!
I1109 17:21:30.657334 21577 ProcessGroupNCCL.cpp:835] [Rank 46] NCCL watchdog thread started!
I1109 17:21:30.657248 18609 ProcessGroupNCCL.cpp:669] [Rank 46] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.657421 18610 ProcessGroupNCCL.cpp:669] [Rank 47] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.657524 21578 ProcessGroupNCCL.cpp:835] [Rank 47] NCCL watchdog thread started!
I1109 17:21:30.667780   531 ProcessGroupNCCL.cpp:669] [Rank 17] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.667881  3569 ProcessGroupNCCL.cpp:835] [Rank 17] NCCL watchdog thread started!
I1109 17:21:30.667886   529 ProcessGroupNCCL.cpp:669] [Rank 16] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.668004  3570 ProcessGroupNCCL.cpp:835] [Rank 16] NCCL watchdog thread started!
I1109 17:21:30.668090   533 ProcessGroupNCCL.cpp:669] [Rank 19] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.668192  3571 ProcessGroupNCCL.cpp:835] [Rank 19] NCCL watchdog thread started!
I1109 17:21:30.668262   532 ProcessGroupNCCL.cpp:669] [Rank 18] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.668382  3572 ProcessGroupNCCL.cpp:835] [Rank 18] NCCL watchdog thread started!
I1109 17:21:30.678879  3352 ProcessGroupNCCL.cpp:835] [Rank 6] NCCL watchdog thread started!
I1109 17:21:30.678824  1100 ProcessGroupNCCL.cpp:669] [Rank 6] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.679248  1097 ProcessGroupNCCL.cpp:669] [Rank 5] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.679388  3353 ProcessGroupNCCL.cpp:835] [Rank 5] NCCL watchdog thread started!
I1109 17:21:30.679386  3354 ProcessGroupNCCL.cpp:835] [Rank 7] NCCL watchdog thread started!
I1109 17:21:30.679355  1101 ProcessGroupNCCL.cpp:669] [Rank 7] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.679529  1095 ProcessGroupNCCL.cpp:669] [Rank 4] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.679600  3355 ProcessGroupNCCL.cpp:835] [Rank 4] NCCL watchdog thread started!
I1109 17:21:30.680392 17855 ProcessGroupNCCL.cpp:669] [Rank 22] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.680505 20699 ProcessGroupNCCL.cpp:835] [Rank 22] NCCL watchdog thread started!
I1109 17:21:30.680585 17854 ProcessGroupNCCL.cpp:669] [Rank 21] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.680720 20700 ProcessGroupNCCL.cpp:835] [Rank 21] NCCL watchdog thread started!
I1109 17:21:30.680688 17856 ProcessGroupNCCL.cpp:669] [Rank 23] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.680776 20701 ProcessGroupNCCL.cpp:835] [Rank 23] NCCL watchdog thread started!
I1109 17:21:30.680795 17852 ProcessGroupNCCL.cpp:669] [Rank 20] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.680944 20702 ProcessGroupNCCL.cpp:835] [Rank 20] NCCL watchdog thread started!
I1109 17:21:30.695470 27562 ProcessGroupNCCL.cpp:669] [Rank 41] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.695573 30419 ProcessGroupNCCL.cpp:835] [Rank 41] NCCL watchdog thread started!
I1109 17:21:30.695571 27563 ProcessGroupNCCL.cpp:669] [Rank 42] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.695693 30420 ProcessGroupNCCL.cpp:835] [Rank 42] NCCL watchdog thread started!
I1109 17:21:30.695807 30421 ProcessGroupNCCL.cpp:835] [Rank 43] NCCL watchdog thread started!
I1109 17:21:30.695744 27564 ProcessGroupNCCL.cpp:669] [Rank 43] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.696017 27561 ProcessGroupNCCL.cpp:669] [Rank 40] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: 0
NCCL_DESYNC_DEBUG: 0
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
I1109 17:21:30.696153 30422 ProcessGroupNCCL.cpp:835] [Rank 40] NCCL watchdog thread started!
I1109 17:21:31.758546 13221 ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: INFO
  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 0/420 [00:00<?, ?it/s]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:44, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:50, 34.20s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:48, 34.20s/it]                                                                                                                                                                                                                                                                                                                                                         0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:44, 34.19s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]                                                                                                                                                                                                      0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]                                                                                                                                                     0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:50, 34.20s/it]  0%|          | 1/420 [00:34<3:58:48, 34.20s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:42, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 1/420 [00:34<3:58:43, 34.18s/it]  0%|          | 1/420 [00:34<3:58:43, 34.19s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:32, 20.32s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:30, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:30, 20.31s/it]  0%|          | 2/420 [00:44<2:21:31, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]                                                                                                                                                                                                                                                                                                                                                                                                          0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:32, 20.32s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]                                                                                                    0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:31, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:30, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]                                                                                                                                                                                                      0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:30, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  0%|          | 2/420 [00:44<2:21:29, 20.31s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:23, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]                                                                                                                                                                                                                                                                                                                                                         1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:23, 15.88s/it]                                                                                                    1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]                                                                                                                                                                                                      1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:21, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 3/420 [00:55<1:50:22, 15.88s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]  1%|          | 4/420 [01:05<1:35:40, 13.80s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]                                                                                                                                                                                                      1%|          | 4/420 [01:06<1:35:40, 13.80s/it]                                                                                                    1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]                                                                                                    1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]                                                   1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:39, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 4/420 [01:06<1:35:40, 13.80s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:29, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]                                                                                                                                                                                                                                                                                                                                                                                                          1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]                                                   1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:29, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|          | 5/420 [01:16<1:27:28, 12.65s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]                                                   1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:11, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.05s/it]  1%|▏         | 6/420 [01:27<1:23:10, 12.06s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]                                                                                                                                                                                                                                                       2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]                                                   2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]                                                                                                    2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]                                                                                                    2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:44, 11.58s/it]                                                   2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 7/420 [01:38<1:19:43, 11.58s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:23, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]                                                                                                                                                                                                                                                                                                        2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:23, 11.27s/it]                                                                                                                                                                                                      2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]                                                   2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]                                                   2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 8/420 [01:48<1:17:24, 11.27s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:46, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]                                                                                                    2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:46, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 9/420 [01:59<1:15:47, 11.06s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]                                                    2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  2%|▏         | 10/420 [02:09<1:14:37, 10.92s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]                                                    3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.83s/it]  3%|▎         | 11/420 [02:20<1:13:47, 10.82s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]                                                                                                                                                        3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 12/420 [02:31<1:13:09, 10.76s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]                                                                                                                                                                                                                                                            3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]                                                                                                      3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 13/420 [02:41<1:12:40, 10.71s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]                                                                                                                                                                                                                                                                                                              3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]                                                                                                      3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]                                                    3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  3%|▎         | 14/420 [02:52<1:12:16, 10.68s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]                                                                                                                                                        4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]                                                                                                      4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▎         | 15/420 [03:02<1:11:56, 10.66s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]                                                    4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 16/420 [03:13<1:11:40, 10.64s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]                                                                                                      4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]                                                                                                                                                        4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 17/420 [03:24<1:11:25, 10.63s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]                                                                                                                                                                                                                                                                                                              4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]                                                                                                      4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  4%|▍         | 18/420 [03:34<1:11:28, 10.67s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.70s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]                                                                                                                                                                                                                                                            5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.70s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 19/420 [03:45<1:11:28, 10.69s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:24, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:24, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:24, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:24, 10.71s/it]                                                                                                                                                        5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▍         | 20/420 [03:56<1:11:23, 10.71s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.73s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.73s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.72s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.73s/it]                                                                                                      5%|▌         | 21/420 [04:07<1:11:19, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.73s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:19, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 21/420 [04:07<1:11:18, 10.72s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:14, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]                                                                                                                                                                                                                                                                                                                                                                                                                  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:14, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 22/420 [04:17<1:11:13, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:06, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]                                                                                                      5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5%|▌         | 23/420 [04:28<1:11:06, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]                                                    5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.75s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  5%|▌         | 23/420 [04:28<1:11:05, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.74s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]                                                    6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:54, 10.74s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.75s/it]  6%|▌         | 24/420 [04:39<1:10:55, 10.74s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:46, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:44, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:44, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:44, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:46, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:44, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:44, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:44, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 25/420 [04:50<1:10:45, 10.75s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:39, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:39, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:39, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:39, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:39, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:39, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▌         | 26/420 [05:01<1:10:38, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:29, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:29, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:27, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:29, 10.76s/it]                                                                                                      6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:29, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:27, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  6%|▋         | 27/420 [05:11<1:10:28, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]                                                                                                                                                                                                          7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]                                                                                                                                                                                                          7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]                                                    7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:17, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 28/420 [05:22<1:10:16, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:07, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]                                                                                                                                                        7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:07, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]                                                    7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]  7%|▋         | 29/420 [05:33<1:10:06, 10.76s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                  ��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
            │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:53, 11.95s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.95s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.95s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.95s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.95s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.95s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:54, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:55, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   �╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                  ��   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
            │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:55, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:55, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:55, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:55, 11.96s/it]╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:159 in <module>                                                        │
│                                                                              │
│   156                                                                        │
│   157                                                                        │
│   158 if __name__ == "__main__":                                             │
│ ❱ 159 │   train()                                                            │
│   160                                                                        │
│                                                                              │
│ /public/home/zhaoying1/work/Baichuan2-main/fine-tune/slurm_script/../fine-tu │
│ ne.py:153 in train                                                           │
│                                                                              │
│   150 │   trainer = transformers.Trainer(                                    │
│   151 │   │   model=model, args=training_args, train_dataset=dataset, tokeni │
│   152 │   )                                                                  │
│ ❱ 153 │   trainer.train()                                                    │
│   154 │   trainer.save_state()                                               │
│   155 │   trainer.save_model(output_dir=training_args.output_dir)            │
│   156                                                                        │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1663 in train                                             │
│                                                                              │
│   1660 │   │   │   args=args,                                                │
│   1661 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1662 │   │   │   trial=trial,                                              │
│ ❱ 1663 │   │   │   ignore_keys_for_eval=ignore_keys_for_eval,                │
│   1664 │   │   )                                                             │
│   1665 │                                                                     │
│   1666 │   def _inner_training_loop(                                         │
│                                                                              │
│ /public/home/zhaoying1/install/transformers-temp/transformers-main/src/trans │
│ formers/trainer.py:1945 in _inner_training_loop                              │
│                                                                              │
│   1942 │   │   │   │                                                         │
│   1943 │   │   │   │   # Optimizer step for deepspeed must be called on ever │
│   1944 │   │   │   │   if self.deepspeed:                                    │
│ ❱ 1945 │   │   │   │   │   self.deepspeed.step()                             │
│   1946 │   │   │   │                                                         │
│   1947 │   │   │   │   if total_batched_samples % args.gradient_accumulation │
│   1948 │   │   │   │   │   # last step in epoch but step is always smaller t │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:2037 in step                               │
│                                                                              │
│   2034 │   │   │   │   │   and self.quantizer.any_precision_switch()):       │
│   2035 │   │   │   │   self._take_model_step(lr_kwargs, self.block_eigenvalu │
│   2036 │   │   │   else:                                                     │
│ ❱ 2037 │   │   │   │   self._take_model_step(lr_kwargs)                      │
│   2038 │   │   │                                                             │
│   2039 │   │   │   report_progress = self.global_rank == 0 if self.global_ra │
│   2040                                                                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/engine.py:1944 in _take_model_step                   │
│                                                                              │
│   1941 │   │   │   │   # https://nvidia.github.io/apex/advanced.html#gradien │
│   1942 │   │   │   │   master_params = amp.master_params(self.optimizer)     │
│   1943 │   │   │   │   clip_grad_norm_(parameters=master_params, max_norm=se │
│ ❱ 1944 │   │   self.optimizer.step()                                         │
│   1945 │   │                                                                 │
│   1946 │   │   if hasattr(self.optimizer, '_global_grad_norm'):              │
│   1947 │   │   │   self._global_grad_norm = self.optimizer._global_grad_norm │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1783 in step                          │
│                                                                              │
│   1780 │   │   self._partition_all_parameters()                              │
│   1781 │   │                                                                 │
│   1782 │   │   #checks for overflow, adjust the loss scale accordingly       │
│ ❱ 1783 │   │   if self._overflow_check_and_loss_scale_update():              │
│   1784 │   │   │   if self.swap_optimizer:                                   │
│   1785 │   │   │   │   self.optimizer_swapper.log_timers()                   │
│   1786 │   │   │   return                                                    │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/utils/nvtx.py:15 in wrapped_fn                               │
│                                                                              │
│   12 │                                                                       │
│   13 │   def wrapped_fn(*args, **kwargs):                                    │
│   14 │   │   get_accelerator().range_push(func.__qualname__)                 │
│ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                 │
│   16 │   │   get_accelerator().range_pop()                                   │
│   17 │   │   return ret_val                                                  │
│   18                                                                         │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:1733 in                               │
│ _overflow_check_and_loss_scale_update                                        │
│                                                                              │
│   1730 │   │                                                                 │
│   1731 │   │   #loss scaling related computation                             │
│   1732 │   │   prev_scale = self.loss_scale                                  │
│ ❱ 1733 │   │   self._update_scale(self.overflow)                             │
│   1734 │   │                                                                 │
│   1735 │   │   if self.overflow:                                             │
│   1736 │   │   │   self._overflow_clean_up(prev_scale)                       │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/zero/stage3.py:2048 in _update_scale                 │
│                                                                              │
│   2045 │   │   self._check_overflow(partition_gradients)                     │
│   2046 │                                                                     │
│   2047 │   def _update_scale(self, has_overflow=False):                      │
│ ❱ 2048 │   │   self.loss_scaler.update_scale(has_overflow)                   │
│   2049 │                                                                     │
│   2050 │   # Promote state so it can be retrieved or set via "fp16_optimizer │
│   2051 │   def _get_state(self):                                             │
│                                                                              │
│ /public/home/zhaoying1/anaconda3/envs/llmtorch110py37/lib/python3.7/site-pac │
│ kages/deepspeed/runtime/fp16/loss_scaler.py:174 in update_scale              │
│                                                                              │
│   171 │   │   │   if self.delayed_shift == 1 or self.cur_hysteresis == 1:    │
│   172 │   │   │   │   if (self.cur_scale == self.min_scale) and self.raise_e │
│   173 │   │   │   │   │   raise Exception(                                   │
│ ❱ 174 │   │   │   │   │   │   "Current loss scale already at minimum - canno │
│   175 │   │   │   │   else:                                                  │
│   176 │   │   │   │   │   next_scale = max(self.cur_scale / self.scale_facto │
│   177 │   │   │   │   │   if dist.get_rank() == 0:                           │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: Current loss scale already at minimum - cannot decrease scale 
anymore. Exiting run.
  7%|▋         | 29/420 [05:46<1:17:56, 11.96s/it]  7%|▋         | 29/420 [05:46<1:17:57, 11.96s/it]--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[49075,1],18]
  Exit code:    1
--------------------------------------------------------------------------