"Note: this varlen kernel performance is as good as the non-varlen kernel shown in Nsight-Compute. As you may observe that the TFLOPS is a bit lower, that's because the unpad operation is included in the above benchmark."
)
if__name__=="__main__":
arch=nvcc.get_target_compute_version()
print(f"Detected GPU compute capability: {arch}")
assertfloat(arch)>=9.0,"This example only supports GPU with compute capability >= 9.0"