Fix saving FlaubertTokenizer configs (#14991)

All specific tokenizer config properties must be passed to its base class (XLMTokenizer) in order to be saved. This was not the case for do_lowercase config. Thus it was not saved by save_pretrained() method and saving and reloading the tokenizer changed its behaviour. This commit fixes it.

Fix saving FlaubertTokenizer configs (#14991)
All specific tokenizer config properties must be passed to its base class (XLMTokenizer) in order to be saved. This was not the case for do_lowercase config. Thus it was not saved by save_pretrained() method and saving and reloading the tokenizer changed its behaviour. This commit fixes it.
57b980a6 · Vladimir Maryasin · GitHub · 16f0b7d7 · 57b980a6
Unverified Commit 57b980a6 authored Jan 11, 2022 by Vladimir Maryasin Committed by GitHub Jan 11, 2022
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/models/flaubert/tokenization_flaubert.py src/transformers/models/flaubert/tokenization_flaubert.py +1 -1

No files found.
--- a/src/transformers/models/flaubert/tokenization_flaubert.py
+++ b/src/transformers/models/flaubert/tokenization_flaubert.py
@@ -96,7 +96,7 @@ class FlaubertTokenizer(XLMTokenizer):
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

    def __init__(self, do_lowercase=False, **kwargs):
-        super().__init__(**kwargs)
+        super().__init__(do_lowercase=do_lowercase, **kwargs)
        self.do_lowercase = do_lowercase
        self.do_lowercase_and_remove_accent = False