Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
9ca25ce8
"vscode:/vscode.git/clone" did not exist on "bdf7e5de92d76ff6dd7cee317ffa43bed8c5d233"
Unverified
Commit
9ca25ce8
authored
Apr 03, 2019
by
Thomas Wolf
Committed by
GitHub
Apr 03, 2019
Browse files
Merge pull request #427 from jeonsworld/patch-1
fix sample_doc
parents
db4dccd1
60005f46
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
examples/lm_finetuning/pregenerate_training_data.py
examples/lm_finetuning/pregenerate_training_data.py
+1
-1
No files found.
examples/lm_finetuning/pregenerate_training_data.py
View file @
9ca25ce8
...
...
@@ -49,7 +49,7 @@ class DocumentDatabase:
self
.
_precalculate_doc_weights
()
rand_start
=
self
.
doc_cumsum
[
current_idx
]
rand_end
=
rand_start
+
self
.
cumsum_max
-
self
.
doc_lengths
[
current_idx
]
sentence_index
=
randint
(
rand_start
,
rand_end
)
%
self
.
cumsum_max
sentence_index
=
randint
(
rand_start
,
rand_end
-
1
)
%
self
.
cumsum_max
sampled_doc_index
=
np
.
searchsorted
(
self
.
doc_cumsum
,
sentence_index
,
side
=
'right'
)
else
:
# If we don't use sentence weighting, then every doc has an equal chance to be chosen
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment