• Jesse Gross's avatar
    llamarunner: Respect device ordering for offloaded layers · 4372d0bf
    Jesse Gross authored
    We used to control the way that llama.cpp saw devices using
    CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
    offloaded to a device were actually the ones intended. This is
    particularly important because we might reorder devices based on
    free memory or performance.
    
    When we started explicitly scheduling layers, this logic went
    away but the llamarunner didn't have any way to set the correct
    order of devices. This meant that the correct number of layers
    would be assigned to a device but not necessarily the layers
    that were expected. This change sets up the devices correctly
    based on the offload information.
    4372d0bf
device.go 16.6 KB