"examples/offline_inference/multilora_inference.py" did not exist on "35cc35014cac8abf2ccdf03f27c7c23aabbb8a98"
-
Benjamin Lefaudeux authored
Same bucketing strategy for OSS and SDP: sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
341d8b2b