fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335)

fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.

fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335)
fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.
516af4bb · fkrasnov2 · GitHub · 62c60a30 · 516af4bb
Unverified Commit 516af4bb authored Jul 30, 2024 by fkrasnov2 Committed by GitHub Jul 30, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/data/data_collator.py src/transformers/data/data_collator.py +1 -1

No files found.
--- a/src/transformers/data/data_collator.py
+++ b/src/transformers/data/data_collator.py
@@ -751,7 +751,7 @@ class DataCollatorForLanguageModeling(DataCollatorMixin):
        inputs = tf.where(indices_replaced, mask_token_id, inputs)

        # 10% of the time, we replace masked input tokens with random word
-        indices_random = self.tf_bernoulli(input_shape, 0.1) & masked_indices & ~indices_replaced
+        indices_random = self.tf_bernoulli(input_shape, 0.5) & masked_indices & ~indices_replaced
        random_words = tf.random.uniform(input_shape, maxval=vocab_size, dtype=inputs.dtype)

        inputs = tf.where(indices_random, random_words, inputs)