• Daniel Hiltgen's avatar
    Better support for AMD multi-GPU on linux (#7212) · d7c94e0c
    Daniel Hiltgen authored
    * Better support for AMD multi-GPU
    
    This resolves a number of problems related to AMD multi-GPU setups on linux.
    
    The numeric IDs used by rocm are not the same as the numeric IDs exposed in
    sysfs although the ordering is consistent.  We have to count up from the first
    valid gfx (major/minor/patch with non-zero values) we find starting at zero.
    
    There are 3 different env vars for selecting GPUs, and only ROCR_VISIBLE_DEVICES
    supports UUID based identification, so we should favor that one, and try
    to use UUIDs if detected to avoid potential ordering bugs with numeric IDs
    
    * ROCR_VISIBLE_DEVICES only works on linux
    
    Use the numeric ID only HIP_VISIBLE_DEVICES on windows
    d7c94e0c
gpu.md 8.18 KB