1. 28 Jul, 2023 1 commit
    • Zhaoheng Ni's avatar
      Move TorchAudio-Squim models to Beta (#3512) · b7d2d928
      Zhaoheng Ni authored
      Summary:
      The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3512
      
      Reviewed By: mthrok
      
      Differential Revision: D47837434
      
      Pulled By: nateanl
      
      fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8
      b7d2d928
  2. 23 Mar, 2023 1 commit
  3. 27 Feb, 2023 1 commit
  4. 15 Jan, 2023 1 commit
    • Zhaoheng Ni's avatar
      Add pre-trained pipelines for XLS-R models (#2978) · 9b7b64e4
      Zhaoheng Ni authored
      Summary:
      The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models:
      - WAV2VEC2_XLSR_300M
      - WAV2VEC2_XLSR_1B
      - WAV2VEC2_XLSR_2B
      
      All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2978
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42501491
      
      Pulled By: nateanl
      
      fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3
      9b7b64e4
  5. 05 Jan, 2023 1 commit
    • Grigory Sizov's avatar
      Add HiFiGAN bundle (#2921) · 54e5c859
      Grigory Sizov authored
      Summary:
      Closes [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314)
      ## Description
      - Add  bundle `HIFIGAN_GENERATOR_V3_LJSPEECH` to prototypes. The bundle contains pre-trained HiFiGAN generator weights from the [original HiFiGAN publication](https://github.com/jik876/hifi-gan#pretrained-model), converted slightly to fit our model
      - Add tests
        - unit tests checking that vocoder and mel-transform implementations in the bundle give the same results as the original ones. Part of the original HiFiGAN code is ported to this repo to enable these tests
        - integration test checking that waveform reconstructed from mel spectrogram by the bundle is close enough to the original
      - Add docs
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2921
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D42034761
      
      Pulled By: sgrigory
      
      fbshipit-source-id: 8b0dadeed510b3c9371d6aa2c46ec7d8378f6048
      54e5c859
  6. 09 Dec, 2022 1 commit
  7. 15 Nov, 2022 1 commit
  8. 14 Sep, 2022 1 commit
  9. 13 Sep, 2022 1 commit
  10. 03 Aug, 2022 2 commits
    • Sean Kim's avatar
      Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2
      Sean Kim authored
      Summary:
      Add new model pretrained weights and tests
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2601
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D38396673
      
      Pulled By: skim0514
      
      fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1
      6ecc11c2
    • bshall's avatar
      An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a
      bshall authored
      Summary:
      I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
      - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
      - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
      - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
      - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?
      
      I hope this is helpful! looking forward to hearing from you.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2472
      
      Reviewed By: hwangjeff
      
      Differential Revision: D38389155
      
      Pulled By: carolineechen
      
      fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
      946b180a
  11. 26 Jul, 2022 1 commit
  12. 25 Jul, 2022 1 commit
  13. 22 Jul, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add documents for SourceSeparationBundle (#2559) · 6cee56ab
      Zhaoheng Ni authored
      Summary:
      - Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
      - Add citation of Libri2Mix dataset in the bundle documentation.
      - url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2559
      
      Reviewed By: carolineechen
      
      Differential Revision: D38036116
      
      Pulled By: nateanl
      
      fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836
      6cee56ab
  14. 21 Jul, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add SourceSeparationBundle to prototype (#2440) · 83362580
      Zhaoheng Ni authored
      Summary:
      - Add SourceSeparationBundle class for source separation pipeline
      - Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset.
      - Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2440
      
      Reviewed By: mthrok
      
      Differential Revision: D37997646
      
      Pulled By: nateanl
      
      fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe
      83362580
  15. 27 Jun, 2022 1 commit
  16. 01 Jun, 2022 1 commit
    • Caroline Chen's avatar
      Move CTC beam search decoder to beta (#2410) · 93024ace
      Caroline Chen authored
      Summary:
      Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.
      
      hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2410
      
      Reviewed By: mthrok
      
      Differential Revision: D36784521
      
      Pulled By: carolineechen
      
      fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
      93024ace
  17. 15 May, 2022 1 commit
    • John Reese's avatar
      [codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc
      John Reese authored
      Summary:
      Applies new import merging and sorting from µsort v1.0.
      
      When merging imports, µsort will make a best-effort to move associated
      comments to match merged elements, but there are known limitations due to
      the diynamic nature of Python and developer tooling. These changes should
      not produce any dangerous runtime changes, but may require touch-ups to
      satisfy linters and other tooling.
      
      Note that µsort uses case-insensitive, lexicographical sorting, which
      results in a different ordering compared to isort. This provides a more
      consistent sorting order, matching the case-insensitive order used when
      sorting import statements by module name, and ensures that "frog", "FROG",
      and "Frog" always sort next to each other.
      
      For details on µsort's sorting and merging semantics, see the user guide:
      https://usort.readthedocs.io/en/stable/guide.html#sorting
      
      Reviewed By: lisroach
      
      Differential Revision: D36402214
      
      fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
      d62875cc
  18. 26 Apr, 2022 1 commit
  19. 21 Apr, 2022 1 commit
    • hwangjeff's avatar
      Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29
      hwangjeff authored
      Summary:
      PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2339
      
      Reviewed By: nateanl
      
      Differential Revision: D35806529
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
      6b242c29
  20. 25 Mar, 2022 1 commit
  21. 22 Mar, 2022 1 commit
    • moto's avatar
      Add download utility specialized for torchaudio (#2283) · 64b98521
      moto authored
      Summary:
      In recent updates, torchaudio added features that download assets/models from
      download.pytorch.org/torchaudio.
      
      To reduce the code duplication, the implementations uses utilities from
      ``torch.hub``, but still, there are patterns repeated in implementing
      the fetch mechanism, notably cache and local file path handling.
      
      This commit introduces the utility function that handles
      download/cache/local path management that can be used for
      fetching pre-trained model data.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2283
      
      Reviewed By: carolineechen
      
      Differential Revision: D35050469
      
      Pulled By: mthrok
      
      fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b
      64b98521
  22. 01 Feb, 2022 1 commit
    • hwangjeff's avatar
      Move ASR features out of prototype (#2187) · aca5591c
      hwangjeff authored
      Summary:
      Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2187
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D33918092
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
      aca5591c
  23. 26 Jan, 2022 1 commit
  24. 30 Dec, 2021 1 commit
  25. 23 Dec, 2021 1 commit
  26. 04 Nov, 2021 1 commit
    • moto's avatar
      Consolidate network utils (#1974) · 536e8ac0
      moto authored
      This commit changes all the `torch.hub` network utility functions to
      be imported from `torchaudio._internal`, so that later we can replace
      the function within fbcode.
      536e8ac0
  27. 03 Nov, 2021 1 commit
  28. 02 Nov, 2021 3 commits
  29. 27 Oct, 2021 1 commit
  30. 25 Oct, 2021 1 commit
  31. 22 Oct, 2021 1 commit
  32. 21 Oct, 2021 1 commit
  33. 15 Oct, 2021 2 commits
    • moto's avatar
      Add TTS bundle/pipelines (#1872) · e885204e
      moto authored
      Future work items:
      - length computation of GriffinLim
      - better way to make InverseMelScale work in inference_mode
      e885204e
    • moto's avatar
      Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd
      moto authored
      - Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
      - Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
      - Update base URL
      fad855cd
  34. 08 Oct, 2021 1 commit
  35. 06 Oct, 2021 2 commits