-
Micha毛l Benesty authored
* Add link to new comunity notebook (optimization) related to https://github.com/huggingface/transformers/issues/4842#event-3469184635 This notebook is about benchmarking model training with/without dynamic padding optimization. https://github.com/ELS-RD/transformers-notebook Using dynamic padding on MNLI provides a **4.7 times training time reduction**, with max pad length set to 512. The effect is strong because few examples are >> 400 tokens in this dataset. IRL, it will depend of the dataset, but it always bring improvement and, after more than 20 experiments listed in this [article](https://towardsdatascience.com/divide-hugging-face-transformers-training-time-by-2-or-more-21bf7129db9q-21bf7129db9e?source=friends_link&sk=10a45a0ace94b3255643d81b6475f409 ), it seems to not hurt performance. Following advice from @patrickvonplaten I do the PR myself :-) * Update notebooks/README.md Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
0cca6192