"...git@developer.sourcefind.cn:chenpangpang/open-webui.git" did not exist on "1a9a56d6905f8b67606ecab30f22f6f7f717fc41"
Unverified Commit f25457b2 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Fix number of examples for iterable dataset in distributed training (#17951)

parent e4d25885
......@@ -1088,6 +1088,10 @@ class Trainer:
dataloader.dataset does not exist or has no length, estimates as best it can
"""
try:
dataset = dataloader.dataset
# Special case for IterableDatasetShard, we need to dig deeper
if isinstance(dataset, IterableDatasetShard):
return len(dataloader.dataset.dataset)
return len(dataloader.dataset)
except (NameError, AttributeError, TypeError): # no dataset or length, estimate by length of dataloader
return len(dataloader) * self.args.per_device_train_batch_size
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment