1. 22 Jul, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add documents for SourceSeparationBundle (#2559) · 6cee56ab
      Zhaoheng Ni authored
      Summary:
      - Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
      - Add citation of Libri2Mix dataset in the bundle documentation.
      - url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2559
      
      Reviewed By: carolineechen
      
      Differential Revision: D38036116
      
      Pulled By: nateanl
      
      fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836
      6cee56ab
  2. 21 Jul, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add SourceSeparationBundle to prototype (#2440) · 83362580
      Zhaoheng Ni authored
      Summary:
      - Add SourceSeparationBundle class for source separation pipeline
      - Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset.
      - Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2440
      
      Reviewed By: mthrok
      
      Differential Revision: D37997646
      
      Pulled By: nateanl
      
      fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe
      83362580
  3. 19 Jul, 2022 1 commit
  4. 12 Jul, 2022 1 commit
  5. 07 Jul, 2022 1 commit
  6. 06 Jul, 2022 1 commit
    • Caroline Chen's avatar
      Fix fluent test for windows (#2510) · 09daa438
      Caroline Chen authored
      Summary:
      fluent dataset test currently fails on windows, due to new line generation in csv writer in testing and incorrect path parsing in dataset impl.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2510
      
      Reviewed By: carolineechen
      
      Differential Revision: D37573203
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4868bc649690c7e596b002686c6128ce735d3564
      09daa438
  7. 28 Jun, 2022 1 commit
    • moto's avatar
      Refactor AVDictionary clean up (#2507) · 0ad03adf
      moto authored
      Summary:
      Small clean up in ffmpeg binding code.
      
      1. Make `get_option_dict` and `clean_up_dict` public utility
      2. Merge the exception into `clean_up_dict`
      3. Get rid of custom string join function and use `c10::Join`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2507
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37466022
      
      Pulled By: mthrok
      
      fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969
      0ad03adf
  8. 27 Jun, 2022 4 commits
  9. 23 Jun, 2022 1 commit
  10. 21 Jun, 2022 1 commit
    • Sean Kim's avatar
      Create musdb handler and tests (#2484) · b92a8a09
      Sean Kim authored
      Summary:
      Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2484
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D37250556
      
      Pulled By: skim0514
      
      fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51
      b92a8a09
  11. 20 Jun, 2022 1 commit
  12. 13 Jun, 2022 1 commit
  13. 10 Jun, 2022 1 commit
  14. 08 Jun, 2022 2 commits
  15. 07 Jun, 2022 1 commit
  16. 04 Jun, 2022 1 commit
    • moto's avatar
      Make FFmpeg log level configurable (#2439) · 877a88c5
      moto authored
      Summary:
      Undesired logs are one of the loudest UX complains we get.
      Yet, loading media files involves uncertainty which is
      difficult to debug without debug log.
      
      This commit introduces utility functions to configure logging level
      so that we can ask users to enable it when they encounter an issue,
      while defaulting to non-verbose option.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2439
      
      Reviewed By: hwangjeff, xiaohui-zhang
      
      Differential Revision: D36903763
      
      Pulled By: mthrok
      
      fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987
      877a88c5
  17. 03 Jun, 2022 1 commit
  18. 02 Jun, 2022 3 commits
  19. 01 Jun, 2022 3 commits
  20. 31 May, 2022 1 commit
  21. 29 May, 2022 1 commit
    • moto's avatar
      Update source info (#2418) · bb77cbeb
      moto authored
      Summary:
      Add num_frames and bits_per_sample to match with the current
      `torchaudio.info` capability.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2418
      
      Reviewed By: carolineechen
      
      Differential Revision: D36749077
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713
      bb77cbeb
  22. 23 May, 2022 2 commits
    • Zhaoheng Ni's avatar
      Add assertion checks to multi-channel functions (#2401) · 38e530d7
      Zhaoheng Ni authored
      Summary:
      - The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
      - The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
      - The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
      - The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2401
      
      Reviewed By: carolineechen
      
      Differential Revision: D36597689
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6
      38e530d7
    • Zhaoheng Ni's avatar
      Add LibriLightLimited dataset (#2302) · af9cab3b
      Zhaoheng Ni authored
      Summary:
      The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
      It contains "10 min", "1 hour", "10 hour" splits.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2302
      
      Reviewed By: mthrok
      
      Differential Revision: D36388188
      
      Pulled By: nateanl
      
      fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249
      af9cab3b
  23. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  24. 20 May, 2022 1 commit
  25. 19 May, 2022 1 commit
    • moto's avatar
      Refactor Streamer implementation (#2402) · eed57534
      moto authored
      Summary:
      * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
      * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
      * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†
      
      † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.
      
      Ref https://github.com/pytorch/audio/issues/2400
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2402
      
      Reviewed By: nateanl
      
      Differential Revision: D36488808
      
      Pulled By: mthrok
      
      fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
      eed57534
  26. 15 May, 2022 1 commit
    • John Reese's avatar
      [codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc
      John Reese authored
      Summary:
      Applies new import merging and sorting from µsort v1.0.
      
      When merging imports, µsort will make a best-effort to move associated
      comments to match merged elements, but there are known limitations due to
      the diynamic nature of Python and developer tooling. These changes should
      not produce any dangerous runtime changes, but may require touch-ups to
      satisfy linters and other tooling.
      
      Note that µsort uses case-insensitive, lexicographical sorting, which
      results in a different ordering compared to isort. This provides a more
      consistent sorting order, matching the case-insensitive order used when
      sorting import statements by module name, and ensures that "frog", "FROG",
      and "Frog" always sort next to each other.
      
      For details on µsort's sorting and merging semantics, see the user guide:
      https://usort.readthedocs.io/en/stable/guide.html#sorting
      
      Reviewed By: lisroach
      
      Differential Revision: D36402214
      
      fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
      d62875cc
  27. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  28. 12 May, 2022 3 commits
    • moto's avatar
      Use module-level `__getattr__` to implement delayed initialization (#2377) · 9499f642
      moto authored
      Summary:
      This commit updates the lazy module initialization logic for
      `torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`.
      
      - The modules are importable regarless of optional dependencies.
      i.e. `import torchaudio.prototype.io` does not trigger the check for
      optional dependencies.
      
      - Optional dependencies are checked when the actual
      API is imported for the first time.
      i.e. `from torchaudio.prototype.io import Streamer` triggers the check
      for optional dependencies.
      
      The downside is that;
      
      - `import torchaudio.prototype.io.Streamer` no longer works.
      
      ## Details:
      
      Starting from Python 3.7, modules can bave `__getattr__` function,
      which serves as a fallback if the import mechanism cannot find the
      attribute.
      
      This can be used to implement lazy import.
      
      ```python
      def __getattr__(name):
          global pi
          if name == 'pi':
              import math
              pi = math.pi
              return pi
          raise AttributeError(...)
      ```
      
      Ref: https://twitter.com/raymondh/status/1094686528440168453
      
      The implementation performs lazy import for the APIs that work with
      external/optional dependencies. In addition, it also check if the
      binding is initialized only once.
      
      ## Why is this preferable approach?
      
      Previously, the optional dependencies were checked at the tiem module
      is imported;
      
      https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5
      
      As long as this module is in `prototype`, which we ask users to import
      explictly, users had control whether they want/do not want to install
      the optional dependencies.
      
      This approach only works for one optional dependencies per one module.
      Say, we add different I/O library as an optional dependency, we need to
      put all the APIs in dedicated submodule. This prevents us from having
      flat namespace.
      i.e. the I/O modules with multiple optional dependencies would look like
      
      ```python
      # Client code
      from torchaudio.io.foo import FooFeature
      from torchaudio.io.bar import BarFeature
      ```
      
      where the new approach would allow
      
      ```python
      #client code
      from torchaudio.io import FooFeature, BarFeature
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2377
      
      Reviewed By: nateanl
      
      Differential Revision: D36305603
      
      Pulled By: mthrok
      
      fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae
      9499f642
    • Zhaoheng Ni's avatar
      Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680
      Zhaoheng Ni authored
      Summary:
      - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
      This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
      - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2296
      
      Reviewed By: mthrok
      
      Differential Revision: D36323217
      
      Pulled By: nateanl
      
      fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
      09639680
    • John Reese's avatar
      [black][codemod] formatting changes from black 22.3.0 · 595dc5d3
      John Reese authored
      Summary:
      Applies the black-fbsource codemod with the new build of pyfmt.
      
      paintitblack
      
      Reviewed By: lisroach
      
      Differential Revision: D36324783
      
      fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
      595dc5d3
  29. 11 May, 2022 1 commit
    • moto's avatar
      Move FFmpeg integrity test from conda smoke test to custom smoke test (#2381) · 9877f544
      moto authored
      Summary:
      Conda package build performs simple smoke test, which is different
      from smoke_test jobs we define on our CI jobs.
      
      Currently Conda packaging smoke test verifies the imporatability of
      `torchaudio.prototype.io`, which requires FFmpeg 4.
      
      1. We list FFmpeg 4 as runtime requirements, but this means that
      conda's dependency resolver takes FFmpeg 4 into consideration.
      FFmpeg 5 was release this year, and we can expect that user base
      will move to FFmpeg gradually. If user environment has some constraint
      on FFmpeg, torchaudio will have conflict and it will prevent users
      from install torchaudio.
      
      2. In #2377 the way optional dependency is checked/initialized is changed,
      so this Conda smoke test will no longer check the integrity with FFmpeg libraries.
      
      To solve the issues above, this commit moves the part that tests integrity with
      FFmpeg libraries to the smoke test we define on CircleCI.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2381
      
      Reviewed By: carolineechen
      
      Differential Revision: D36323706
      
      Pulled By: mthrok
      
      fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3
      9877f544