• Masaki Kozuki's avatar
    allow for `None` batch (#1280) · a960fe8c
    Masaki Kozuki authored
    * have get_kth_microbatch deal with None batch
    
    * broadcast based on tensor parallel rank
    
    * dtype
    
    * remove unnecessary .cuda()
    
    Processes of tensor parallel rank != 0 doesn't need to prepare one or more `torch.utils.data.DataLoader` instances, which means the argument of `batch` of `get_kth_microbatch` function can be `None` but the current function implementation doesn't allow for it.
    a960fe8c
utils.py 12.3 KB