pytorch/mem_transformer.py · 6860c1bf6de72ef29f7c230e008c5488cdef8035 · OpenDAS / FastMoE

Jiezhong Qiu authored Dec 07, 2020

* when k=1, it reduces to torch.max, and not surprising that torch.max is
faster than torch.topk.
* however when k=2, it is even slower than torch.topk

1feaaf0c

mem_transformer.py 43.7 KB

Replace mem_transformer.py