tests/models/mixtral/test_modeling_mixtral.py · 75769744e95797936e3994316ca361cfe01e7fc6 · chenpangpang / transformers

Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517) · c5c69096

Khai Mai authored Jan 24, 2024

* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask

* format code using black and ruff

* skip computing mask if attention_mask=None

* add tests for load balancing loss Mixtral-Moe

* fix assert loss is different in mixtral_test

* fix pad_leng

* use assertNotAlmostEqual and print to debug

* remove print for debug

* minor updates

* reduce rtol and atol

c5c69096

test_modeling_mixtral.py 22.6 KB

Replace test_modeling_mixtral.py