• Teven's avatar
    Xlnet outputs (#5881) · 13be4872
    Teven authored
    Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
    13be4872
test_modeling_xlnet.py 28.9 KB