Disable multiprocessing when dumping features in hubert preprocessing (#2311)

Summary: The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue. Pull Request resolved: https://github.com/pytorch/audio/pull/2311 Reviewed By: mthrok Differential Revision: D35393813 Pulled By: nateanl fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11

Disable multiprocessing when dumping features in hubert preprocessing (#2311)
Summary: The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue. Pull Request resolved: https://github.com/pytorch/audio/pull/2311 Reviewed By: mthrok Differential Revision: D35393813 Pulled By: nateanl fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11
f7afe29e · Zhaoheng Ni · Facebook GitHub Bot · 11328d23 · f7afe29e
Commit f7afe29e authored Apr 05, 2022 by Zhaoheng Ni Committed by Facebook GitHub Bot Apr 05, 2022
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 9 deletions

examples/hubert/preprocess.py examples/hubert/preprocess.py +2 -9

No files found.
--- a/examples/hubert/preprocess.py
+++ b/examples/hubert/preprocess.py
@@ -8,7 +8,6 @@ The script includes:
 """
 import logging
 from argparse import ArgumentParser, RawTextHelpFormatter
-from multiprocessing import Pool
 from pathlib import Path
 import torch
@@ -99,9 +98,8 @@ def main(args):
        feat_dir.mkdir()
    for split in ["train", "valid"]:
-        p = Pool(args.num_rank)
+        for rank in range(1, args.num_rank + 1):
-        inputs = [
+            dump_features(
-            (
                tsv_dir / f"{args.dataset}_{split}.tsv",
                feat_dir,
                split,
@@ -113,11 +111,6 @@ def main(args):
                args.checkpoint_path,
                16_000,
            )
-            for rank in range(1, args.num_rank + 1)
-        ]
-        _ = p.starmap(dump_features, inputs)
-        p.close()
-        p.join()
    # Fit KMeans clustering model
    learn_kmeans(