Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
6f1adc43
Unverified
Commit
6f1adc43
authored
Jul 08, 2021
by
Sylvain Gugger
Committed by
GitHub
Jul 08, 2021
Browse files
Fix group_lengths for short datasets (#12558)
parent
0a6b9048
Changes
10
Show whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
20 additions
and
10 deletions
+20
-10
examples/flax/language-modeling/run_clm_flax.py
examples/flax/language-modeling/run_clm_flax.py
+2
-1
examples/flax/language-modeling/run_mlm_flax.py
examples/flax/language-modeling/run_mlm_flax.py
+2
-1
examples/flax/language-modeling/run_t5_mlm_flax.py
examples/flax/language-modeling/run_t5_mlm_flax.py
+2
-1
examples/pytorch/language-modeling/run_clm.py
examples/pytorch/language-modeling/run_clm.py
+2
-1
examples/pytorch/language-modeling/run_clm_no_trainer.py
examples/pytorch/language-modeling/run_clm_no_trainer.py
+2
-1
examples/pytorch/language-modeling/run_mlm.py
examples/pytorch/language-modeling/run_mlm.py
+2
-1
examples/pytorch/language-modeling/run_mlm_no_trainer.py
examples/pytorch/language-modeling/run_mlm_no_trainer.py
+2
-1
examples/pytorch/language-modeling/run_plm.py
examples/pytorch/language-modeling/run_plm.py
+2
-1
examples/tensorflow/language-modeling/run_clm.py
examples/tensorflow/language-modeling/run_clm.py
+2
-1
examples/tensorflow/language-modeling/run_mlm.py
examples/tensorflow/language-modeling/run_mlm.py
+2
-1
No files found.
examples/flax/language-modeling/run_clm_flax.py
View file @
6f1adc43
...
...
@@ -398,6 +398,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
block_size
:
total_length
=
(
total_length
//
block_size
)
*
block_size
# Split by chunks of max_len.
result
=
{
...
...
examples/flax/language-modeling/run_mlm_flax.py
View file @
6f1adc43
...
...
@@ -431,6 +431,7 @@ if __name__ == "__main__":
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
max_seq_length
:
total_length
=
(
total_length
//
max_seq_length
)
*
max_seq_length
# Split by chunks of max_len.
result
=
{
...
...
examples/flax/language-modeling/run_t5_mlm_flax.py
View file @
6f1adc43
...
...
@@ -541,6 +541,7 @@ if __name__ == "__main__":
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
expanded_inputs_length
:
total_length
=
(
total_length
//
expanded_inputs_length
)
*
expanded_inputs_length
# Split by chunks of max_len.
result
=
{
...
...
examples/pytorch/language-modeling/run_clm.py
View file @
6f1adc43
...
...
@@ -404,6 +404,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
block_size
:
total_length
=
(
total_length
//
block_size
)
*
block_size
# Split by chunks of max_len.
result
=
{
...
...
examples/pytorch/language-modeling/run_clm_no_trainer.py
View file @
6f1adc43
...
...
@@ -343,6 +343,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
block_size
:
total_length
=
(
total_length
//
block_size
)
*
block_size
# Split by chunks of max_len.
result
=
{
...
...
examples/pytorch/language-modeling/run_mlm.py
View file @
6f1adc43
...
...
@@ -433,6 +433,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
max_seq_length
:
total_length
=
(
total_length
//
max_seq_length
)
*
max_seq_length
# Split by chunks of max_len.
result
=
{
...
...
examples/pytorch/language-modeling/run_mlm_no_trainer.py
View file @
6f1adc43
...
...
@@ -387,6 +387,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
max_seq_length
:
total_length
=
(
total_length
//
max_seq_length
)
*
max_seq_length
# Split by chunks of max_len.
result
=
{
...
...
examples/pytorch/language-modeling/run_plm.py
View file @
6f1adc43
...
...
@@ -406,6 +406,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
max_seq_length
:
total_length
=
(
total_length
//
max_seq_length
)
*
max_seq_length
# Split by chunks of max_len.
result
=
{
...
...
examples/tensorflow/language-modeling/run_clm.py
View file @
6f1adc43
...
...
@@ -405,6 +405,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
block_size
:
total_length
=
(
total_length
//
block_size
)
*
block_size
# Split by chunks of max_len.
result
=
{
...
...
examples/tensorflow/language-modeling/run_mlm.py
View file @
6f1adc43
...
...
@@ -466,6 +466,7 @@ def main():
total_length
=
len
(
concatenated_examples
[
list
(
examples
.
keys
())[
0
]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if
total_length
>=
max_seq_length
:
total_length
=
(
total_length
//
max_seq_length
)
*
max_seq_length
# Split by chunks of max_len.
result
=
{
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment