f"Capture draft cuda graph end. Time elapsed: {time.perf_counter()-tic:.2f} s. mem usage={(before_mem-after_mem):.2f} GB. avail mem={after_mem:.2f} GB."
f"Capture draft extend cuda graph end. Time elapsed: {time.perf_counter()-tic:.2f} s. mem usage={(before_mem-after_mem):.2f} GB. avail mem={after_mem:.2f} GB."
# True only for overlap eagle and the current batch is decode. This seq will be part of the decode, so the final iteration's allocation is not used (i.e. this case).