1. 07 Jun, 2022 3 commits
  2. 04 Jun, 2022 1 commit
  3. 03 Jun, 2022 3 commits
  4. 02 Jun, 2022 1 commit
    • Zhaoheng Ni's avatar
      Update MVDR beamforming tutorial (#2398) · d01f5891
      Zhaoheng Ni authored
      Summary:
      - Use `download_asset` to download audios.
      - Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules.
      - Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation.
      - Visualize the spectrograms and masks.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2398
      
      Reviewed By: carolineechen
      
      Differential Revision: D36549402
      
      Pulled By: nateanl
      
      fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5
      d01f5891
  5. 01 Jun, 2022 1 commit
    • Caroline Chen's avatar
      Move CTC beam search decoder to beta (#2410) · 93024ace
      Caroline Chen authored
      Summary:
      Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.
      
      hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2410
      
      Reviewed By: mthrok
      
      Differential Revision: D36784521
      
      Pulled By: carolineechen
      
      fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
      93024ace
  6. 26 May, 2022 1 commit
  7. 23 May, 2022 1 commit
  8. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  9. 15 May, 2022 1 commit
    • John Reese's avatar
      [codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc
      John Reese authored
      Summary:
      Applies new import merging and sorting from µsort v1.0.
      
      When merging imports, µsort will make a best-effort to move associated
      comments to match merged elements, but there are known limitations due to
      the diynamic nature of Python and developer tooling. These changes should
      not produce any dangerous runtime changes, but may require touch-ups to
      satisfy linters and other tooling.
      
      Note that µsort uses case-insensitive, lexicographical sorting, which
      results in a different ordering compared to isort. This provides a more
      consistent sorting order, matching the case-insensitive order used when
      sorting import statements by module name, and ensures that "frog", "FROG",
      and "Frog" always sort next to each other.
      
      For details on µsort's sorting and merging semantics, see the user guide:
      https://usort.readthedocs.io/en/stable/guide.html#sorting
      
      Reviewed By: lisroach
      
      Differential Revision: D36402214
      
      fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
      d62875cc
  10. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  11. 12 May, 2022 2 commits
    • Zhaoheng Ni's avatar
      Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680
      Zhaoheng Ni authored
      Summary:
      - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
      This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
      - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2296
      
      Reviewed By: mthrok
      
      Differential Revision: D36323217
      
      Pulled By: nateanl
      
      fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
      09639680
    • John Reese's avatar
      [black][codemod] formatting changes from black 22.3.0 · 595dc5d3
      John Reese authored
      Summary:
      Applies the black-fbsource codemod with the new build of pyfmt.
      
      paintitblack
      
      Reviewed By: lisroach
      
      Differential Revision: D36324783
      
      fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
      595dc5d3
  12. 11 May, 2022 1 commit
    • hwangjeff's avatar
      Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5
      hwangjeff authored
      Summary:
      Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
      - Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
      - Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
      - Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
      - Modifies training script to allow for specifying path model checkpoint to restart training from.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2366
      
      Reviewed By: mthrok
      
      Differential Revision: D36305028
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba
      69467ea5
  13. 28 Apr, 2022 1 commit
  14. 26 Apr, 2022 1 commit
  15. 22 Apr, 2022 1 commit
    • Zhaoheng Ni's avatar
      Introduce DistributedBatchSampler (#2299) · 6411c9ad
      Zhaoheng Ni authored
      Summary:
      When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode.
      
      The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`.
      
      The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value.
      Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__``  and the real length is the same.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2299
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35781538
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a
      6411c9ad
  16. 21 Apr, 2022 1 commit
    • hwangjeff's avatar
      Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29
      hwangjeff authored
      Summary:
      PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2339
      
      Reviewed By: nateanl
      
      Differential Revision: D35806529
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
      6b242c29
  17. 13 Apr, 2022 2 commits
    • hwangjeff's avatar
      Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T LibriSpeech training recipe to examples directory.
      
      Produces 30M-parameter model that achieves the following WER:
      
      |                     |          WER |
      |:-------------------:|-------------:|
      | test-clean          |       0.0310 |
      | test-other          |       0.0805 |
      | dev-clean           |       0.0314 |
      | dev-other           |       0.0827 |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2329
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35578727
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d
      c262758b
    • hwangjeff's avatar
      Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc
      hwangjeff authored
      Summary:
      Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2325
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35597753
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c
      fb51cecc
  18. 05 Apr, 2022 1 commit
  19. 04 Apr, 2022 2 commits
  20. 01 Apr, 2022 1 commit
  21. 25 Mar, 2022 1 commit
  22. 24 Mar, 2022 2 commits
  23. 22 Mar, 2022 1 commit
    • Hagen Wierstorf's avatar
      Fix calculation of SNR value in tutorial (#2285) · 8395fe65
      Hagen Wierstorf authored
      Summary:
      The calculation of the SNR in tha data augmentation examples seems to be wrong to me:
      
      ![image](https://user-images.githubusercontent.com/173624/159487032-c60470c6-ef8e-48a0-ad5e-a117fcb8d606.png)
      
      If we start from the definition of the signal-to-noise ratio using the root mean square value we get:
      
      ```
      SNR = 20 log10 ( rms(scale * speech) / rms(noise) )
      ```
      this can be transformed to
      ```
      scale = 10^(SNR/20) rms(noise) / rms(speech)
      ```
      In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have
      ```
      rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2)
      ```
      this would lead us to:
      ```
      10^(SNR/20) = e^(SNR / 10)
      ```
      which is not true.
      
      Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`.
      
      For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2285
      
      Reviewed By: nateanl
      
      Differential Revision: D35047737
      
      Pulled By: mthrok
      
      fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3
      8395fe65
  24. 17 Mar, 2022 1 commit
  25. 10 Mar, 2022 1 commit
  26. 08 Mar, 2022 1 commit
  27. 26 Feb, 2022 1 commit
    • moto's avatar
      Improve device streaming (#2202) · 365313ed
      moto authored
      Summary:
      This commit adds tutorial for device ASR, and update API for device streaming.
      
      The changes for the interface are
      1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
      2. Move `fill_buffer` method to private.
      
      When dealing with device stream, there are situations where the device buffer is not
      ready and the system returns `EAGAIN`. In such case, the previous implementation of
      `process_packet` method raised an exception in Python layer , but for device ASR,
      this is inefficient. A better approach is to retry within C++ layer in blocking manner.
      The new `timeout` parameter serves this purpose.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2202
      
      Reviewed By: nateanl
      
      Differential Revision: D34475829
      
      Pulled By: mthrok
      
      fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
      365313ed
  28. 24 Feb, 2022 1 commit
  29. 23 Feb, 2022 1 commit
  30. 17 Feb, 2022 2 commits
  31. 16 Feb, 2022 1 commit