use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
Attach a file by drag & drop or click to upload