Unverified Commit bfcd5743 authored by Bill Ray's avatar Bill Ray Committed by GitHub
Browse files

In `group_texts` function, drop last block if smaller than `block_size` (#17908)

parent f71895a6
......@@ -141,6 +141,7 @@ Now you need a second preprocessing function to capture text truncated from any
>>> def group_texts(examples):
... concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
... total_length = len(concatenated_examples[list(examples.keys())[0]])
... total_length = (total_length // block_size) * block_size
... result = {
... k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
... for k, t in concatenated_examples.items()
......
......@@ -141,6 +141,7 @@ Ahora necesitas una segunda función de preprocesamiento para capturar el texto
>>> def group_texts(examples):
... concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
... total_length = len(concatenated_examples[list(examples.keys())[0]])
... total_length = (total_length // block_size) * block_size
... result = {
... k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
... for k, t in concatenated_examples.items()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment