"...git@developer.sourcefind.cn:sugon_wxj/megatron-lm.git" did not exist on "690291dd85d369fbf2495f2f3f0b3c03bd945c31"
-
Antoine Adam authored
According to the `setup.py` file, only dependencies are torch and einops. But the `bert_padding.py` file requires `numpy` only to multiply the elements of a `torch.Size` object. This change aims at allowing the use of FlashAttention without numpy.
4e38df05