• Daniel Hiltgen's avatar
    CUDA: filter devices on secondary discovery (#13317) · 3f308367
    Daniel Hiltgen authored
    We now do a deeper probe of CUDA devices to verify the library version has
    the correct compute capability coverage for the device.  Due to ROCm also
    interpreting the CUDA env var to filter AMD devices, we try to avoid setting
    it which leads to problems in mixed vendor systems.  However without setting
    it for this deeper probe, each CUDA library subprocess discovers all CUDA GPUs
    and on systems with lots of GPUs, this can lead to hitting timeouts.  The fix is
    to turn on the CUDA visibility env var just for this deeper probe use-case.
    3f308367
device.go 17.3 KB