1. 01 Feb, 2023 1 commit
  2. 31 Jan, 2023 1 commit
    • Moto Hira's avatar
      Remove unnecessary AVFrame allocation (#3021) · 0709cadc
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3021
      
      When input format and encode format is different in StreamWriter, filter for format conversion is inserted.
      
      A temporary AVFilter (`dst_frame`) is used for this case,
      but FilterGraph handles the memory allocation,
      so there is no need to perform allocation by ourselves.
      
      This `dst_frame` is otherwise not used, so we do not have to allocate memory at all.
      This commit removes the unnecessary memory allocation at all.
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42865042
      
      fbshipit-source-id: 2673b06de1e905dc73a11e2ec1cc6ce7b525d451
      0709cadc
  3. 30 Jan, 2023 2 commits
    • Yan Li's avatar
      Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627
      Yan Li authored
      Summary:
      Currently there will be a few errors when this tutorial is run with a CUDA device.
      
      The reasons being:
      - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
      - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3017
      
      Reviewed By: mthrok
      
      Differential Revision: D42828526
      
      Pulled By: nateanl
      
      fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
      da9d1627
    • moto's avatar
      Add get_build_config ffmpeg utility function (#3014) · 635d8cff
      moto authored
      Summary:
      We often need to look at which FFmpeg was found and linked when debugging an issue.
      
      Version number is often not enough but there is no easy way to find where the library was found either.
      
      This commit adds utility function that prints the build time configuration.
      
      It helps to distinguish if the linked FFmpeg is the one from binary distribution built in CI or locally built.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3014
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42794952
      
      Pulled By: mthrok
      
      fbshipit-source-id: 91ed358fde8cfe9d6d950f34742b1722e729cf4e
      635d8cff
  4. 27 Jan, 2023 3 commits
  5. 26 Jan, 2023 3 commits
  6. 24 Jan, 2023 1 commit
  7. 23 Jan, 2023 3 commits
  8. 22 Jan, 2023 1 commit
    • moto's avatar
      Make StreamReader return PTS (#2975) · 0dd59e0d
      moto authored
      Summary:
      This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.
      
      Example
      
      ```python
      from torchaudio.io import StreamReader
      
      s = StreamReader(...)
      s.add_video_stream(...)
      for (video_chunk, ) in s.stream():
          # video_chunk is Torch tensor type but has extra attribute of PTS
          print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
      ```
      
      For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
      of Tensor and metadata, but works like a normal tensor in PyTorch operations.
      
      The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).
      
      It was also suggested to attach metadata directly to Tensor object,
      but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
      PyTorch cannot be ignored, so we use Tensor subclass implementation.
      
      If any unexpected issue arise from metadata attribute name collision, client code can
      fetch the bare Tensor and continue.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2975
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42526945
      
      Pulled By: mthrok
      
      fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35
      0dd59e0d
  9. 20 Jan, 2023 4 commits
  10. 19 Jan, 2023 3 commits
    • Zhaoheng Ni's avatar
      Add modularized SSL training recipe (#2876) · 2eaefe27
      Zhaoheng Ni authored
      Summary:
      TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2876
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42617414
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
      2eaefe27
    • hwangjeff's avatar
      Simplify train step in Conformer RNN-T LibriSpeech recipe (#2981) · c6a52355
      hwangjeff authored
      Summary:
      In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2981
      
      Reviewed By: mthrok
      
      Differential Revision: D42507228
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
      c6a52355
    • hwangjeff's avatar
      Make lengths optional for additive noise operators (#2977) · bb077284
      hwangjeff authored
      Summary:
      For greater flexibility, this PR makes argument `lengths` optional for `add_noise` and `AddNoise`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2977
      
      Reviewed By: nateanl
      
      Differential Revision: D42484211
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 54757dcc73df194bb98c1d9d42a2f43f3027b190
      bb077284
  11. 17 Jan, 2023 2 commits
    • Moto Hira's avatar
      Fix buffer flushing mechanism · 51731bf9
      Moto Hira authored
      Summary:
      When buffered data are cleared from ChunkedBuffer,
      the `num_buffered_frames` variable was not updated.
      
      This commit fixes that.
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42538519
      
      fbshipit-source-id: a24a9afcebebd8956d977f05e9c2f0b603d060d1
      51731bf9
    • Zhaoheng Ni's avatar
      Fix mel spectrogram visualization in TTS tutorial (#2989) · b983c665
      Zhaoheng Ni authored
      Summary:
      The mel spectrograms in the TTS tutorial are upside down. The PR fixes it by using `origin="lower"` in imshow.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2989
      
      Reviewed By: mthrok
      
      Differential Revision: D42538349
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4388103a49bdfabf1705c1f979d44ecedd5c910a
      b983c665
  12. 16 Jan, 2023 4 commits
    • moto's avatar
      Refactor buffer common utils (#2988) · e259f156
      moto authored
      Summary:
      Split `convert_video` into memory allocation function and write function.
      
      Also put all the buffer implementations into detail namespace.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2988
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42536769
      
      Pulled By: mthrok
      
      fbshipit-source-id: 36fbf437d4bfd521322846161ae08a48c782c540
      e259f156
    • Robin Scheibler's avatar
      Fixes examples/source_separation for WSJ0_2mix dataset (#2987) · f9d38796
      Robin Scheibler authored
      Summary:
      The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following.
      
      1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset
      2. Corrects `args.data_dir` to `args.root_dir` in eval.py
      3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2987
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42536992
      
      Pulled By: nateanl
      
      fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
      f9d38796
    • moto's avatar
      Refactor chunked buffer implementation (#2984) · 52b6bc3b
      moto authored
      Summary:
      So that the number of Tensor frames stored in buffers is always a multiple of frames_per_chunk.
      
      This makes it easy to store PTS values in aligned manner.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2984
      
      Reviewed By: nateanl
      
      Differential Revision: D42526670
      
      Pulled By: mthrok
      
      fbshipit-source-id: d83ee914b7e50de3b51758069b0e0b6b3ebe2e54
      52b6bc3b
    • moto's avatar
      Set filter graph #threads to 1 (#2985) · 3ecf78d6
      moto authored
      Summary:
      FilterGraph supports multi threading, and by default, the number of threads is determined automatically.
      
      Rather than an automatic behavior, which is unpredictable, it is better to fix the number of threads to 1.
      
      Follow-up: Add an interface to adjust it.
      
      Similar to https://github.com/pytorch/audio/pull/2949.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2985
      
      Reviewed By: nateanl
      
      Differential Revision: D42526958
      
      Pulled By: mthrok
      
      fbshipit-source-id: c4f7f95317e93a39378107636a3ca30f6ddfe466
      3ecf78d6
  13. 15 Jan, 2023 1 commit
    • Zhaoheng Ni's avatar
      Add pre-trained pipelines for XLS-R models (#2978) · 9b7b64e4
      Zhaoheng Ni authored
      Summary:
      The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models:
      - WAV2VEC2_XLSR_300M
      - WAV2VEC2_XLSR_1B
      - WAV2VEC2_XLSR_2B
      
      All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2978
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42501491
      
      Pulled By: nateanl
      
      fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3
      9b7b64e4
  14. 14 Jan, 2023 1 commit
  15. 13 Jan, 2023 2 commits
  16. 12 Jan, 2023 4 commits
    • mthrok's avatar
      Refactor extension modules initialization (#2968) · 5dfe0b22
      mthrok authored
      Summary:
      * Refactor _extension module so that
        * the implementation of initialization logic and its execution are separated.
          * logic goes to `_extension.utils`
          * the execution is at `_extension.__init__`
          * global variables are defined and modified in `__init__`.
      * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED`
      * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE`
      * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`.
      * Merge the sox-related initialization logic in `_extension.utils` module.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2968
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42387251
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f
      5dfe0b22
    • moto's avatar
      Add query methods to FilterGraph (#2976) · 32d46f94
      moto authored
      Summary:
      This commit add methods to query output configuration from FilterGraph object.
      * time_base -> required to compute PTS of output frame
      * sample_rate, num_channels -> required to compute PTS and pre allocate buffers for audio.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2976
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42466744
      
      Pulled By: mthrok
      
      fbshipit-source-id: dd27109819bfb1fbe37b8233dd6a5e4224fe3f6c
      32d46f94
    • moto's avatar
      Add `buffer_chunk_size=-1` option (#2969) · 22788a8f
      moto authored
      Summary:
      This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2969
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42403467
      
      Pulled By: mthrok
      
      fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992
      22788a8f
    • moto's avatar
      Update C++ standard to 17 (#2973) · d1cc1da6
      moto authored
      Summary:
      Following the change in PyTorch core.
      
      https://github.com/pytorch/pytorch/commit/87e4a087784c805312a2b48bb063d2400df26c5e
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2973
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42462709
      
      Pulled By: mthrok
      
      fbshipit-source-id: 60c2aa3d63fe25d8e0b7aa476404e7a55d6eb87f
      d1cc1da6
  17. 11 Jan, 2023 1 commit
  18. 10 Jan, 2023 2 commits
  19. 06 Jan, 2023 1 commit