Fixing topology detection memory access and CU masking for multi XCD GPUs (#116)
* Fixing potential out-of-bounds write during topology detection * Fixing CU_MASK for multi-XCD GPUs * Adding sub-iterations via NUM_SUBITERATIONS * Adding support for variable subexecutor Transfers * Adding healthcheck preset
Showing
Please register or sign in to comment