1. 19 Dec, 2022 1 commit
    • moto's avatar
      Split extract_archive into dedicated functions. (#2927) · 5807078c
      moto authored
      Summary:
      `extra_archive` in `datasets.utils` does not distinguish the input type, and blindly treats it as tar, then zip in case of failure.
      
      This is an anti-pattern. All the dataset implementations know which archive type the downloaded files are.
      
      This commit splits extract_archive function into dedicated functions, and make each dataset use the correct one.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2927
      
      Reviewed By: carolineechen
      
      Differential Revision: D42154069
      
      Pulled By: mthrok
      
      fbshipit-source-id: bc46cc2af26aa086ef49aa1f9a94b6dedb55f85e
      5807078c
  2. 19 Oct, 2022 1 commit
  3. 22 Sep, 2022 1 commit
  4. 21 Sep, 2022 1 commit
    • Caroline Chen's avatar
      Add metadata mode for various datasets (#2697) · ad2b61d7
      Caroline Chen authored
      Summary:
      Add metadata mode for the following SUPERB benchmark datasets
      - QUESST14
      - Fluent Speech Commands
      - VoxCeleb1
      
      follow ups:
      - Add metadata mode for LibriMix -- waiting for unit tests to merge
      - Add IEMOCAP + SNIPS datasets
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2697
      
      Reviewed By: mthrok
      
      Differential Revision: D39666809
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 3a8f07627acceed70f960f47e694efad75b108c2
      ad2b61d7
  5. 15 Sep, 2022 1 commit
  6. 26 Jul, 2022 1 commit
  7. 27 Jun, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add VoxCeleb1 dataset (#2349) · 21b2d139
      Zhaoheng Ni authored
      Summary:
      This PR adds two dataset classes of VoxCeleb1 corpus.
      - `VoxCeleb1Identification`
      Each data sample contains the waveform, sample rate, speaker id, and the file id.
      - `VoxCeleb1Verification`
      Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2349
      
      Reviewed By: carolineechen
      
      Differential Revision: D35927921
      
      Pulled By: nateanl
      
      fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449
      21b2d139