Unverified Commit c98a6ac2 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Use argument for preprocessing workers in run_summairzation (#15394)

parent db079567
...@@ -443,6 +443,7 @@ def main(): ...@@ -443,6 +443,7 @@ def main():
processed_datasets = raw_datasets.map( processed_datasets = raw_datasets.map(
preprocess_function, preprocess_function,
batched=True, batched=True,
num_proc=args.preprocessing_num_workers,
remove_columns=column_names, remove_columns=column_names,
load_from_cache_file=not args.overwrite_cache, load_from_cache_file=not args.overwrite_cache,
desc="Running tokenizer on dataset", desc="Running tokenizer on dataset",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment