Sync preprocesses before loading the processor at run_speech_recognition_ctc.py (#21926)

* Update run_speech_recognition_ctc.py Make sure all processes wait until data is saved before loading the processor from the output_dit * Make sure all processes wait until data is saved before loading the processor from the output_dit * Update run_speech_recognition_ctc.py * Update run_speech_recognition_seq2seq.py

Sync preprocesses before loading the processor at run_speech_recognition_ctc.py (#21926)
* Update run_speech_recognition_ctc.py Make sure all processes wait until data is saved before loading the processor from the output_dit * Make sure all processes wait until data is saved before loading the processor from the output_dit * Update run_speech_recognition_ctc.py * Update run_speech_recognition_seq2seq.py
d5239bab · Mikel Penagarikano · GitHub · f49b0762 · d5239bab · d5239bab
Unverified Commit d5239bab authored Apr 05, 2023 by Mikel Penagarikano Committed by GitHub Apr 05, 2023
2 changed files
--- a/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
+++ b/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
@@ -673,6 +673,9 @@ def main():
        return metrics

    # Now save everything to be able to create a single processor later
+    # make sure all processes wait until data is saved
+    with training_args.main_process_first():
+        # only the main process saves them
        if is_main_process(training_args.local_rank):
            # save feature extractor, tokenizer and config
            feature_extractor.save_pretrained(training_args.output_dir)

--- a/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
+++ b/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
@@ -506,6 +506,9 @@ def main():
        return {"wer": wer}

    # 9. Create a single speech processor
+    # make sure all processes wait until data is saved
+    with training_args.main_process_first():
+        # only the main process saves them
        if is_main_process(training_args.local_rank):
            # save feature extractor, tokenizer and config
            feature_extractor.save_pretrained(training_args.output_dir)