tests/test_modeling_flax_roberta.py · f8eda599bd1dde9cde9ffb2214719d32aedcc3ee · chenpangpang / transformers

[FlaxBert] Fix non-broadcastable attention mask for batched forward-passes (#8791) · f8eda599

Kristian Holsheimer authored Nov 27, 2020

* [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes

* [FlaxRoberta] Fix non-broadcastable attention mask

* Use jax.numpy instead of ordinary numpy (otherwise not jit-able)

* Partially revert "Use jax.numpy ..."

* Add tests for batched forward passes

* Avoid unnecessary OOMs due to preallocation of GPU memory by XLA

* Auto-fix style

* Re-enable GPU memory preallocation but with mem fraction < 1/paralleism

f8eda599

test_modeling_flax_roberta.py 2.57 KB

Replace test_modeling_flax_roberta.py