Commit 6b0cce84 authored by Jingfei Du's avatar Jingfei Du Committed by Facebook Github Bot
Browse files

fix bug for masking prob (#758)

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/758

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/603

fixed a typo for _mask_block of mlm. This typo will make we never set masked token as random token, which should take 10% of the masked tokens.

Reviewed By: akinh

Differential Revision: D15492315

fbshipit-source-id: 1e03dc862e23a6543e51d7401c74608d366ba62d
parent 6b3a516f
...@@ -166,7 +166,7 @@ class MaskedLMDataset(FairseqDataset): ...@@ -166,7 +166,7 @@ class MaskedLMDataset(FairseqDataset):
# replace with random token if probability is less than # replace with random token if probability is less than
# masking_prob + random_token_prob (Eg: 0.9) # masking_prob + random_token_prob (Eg: 0.9)
elif rand < (self.masking_ratio + self.random_token_prob): elif rand < (self.masking_prob + self.random_token_prob):
# sample random token from dictionary # sample random token from dictionary
masked_sent[i] = ( masked_sent[i] = (
np.random.randint( np.random.randint(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment