1. 12 Dec, 2022 1 commit
    • moto's avatar
      Update precise seek behavior for t=0 (#2915) · cbd35438
      moto authored
      Summary:
      It was reported that when videos with invalid PTS values are fed to StreamReader, StreamReader returns only the last frame.
      
      https://github.com/pytorch/vision/blob/677fc939b21a8893f07db4c1f90482b648b6573f/test/assets/videos/RATRACE_wave_f_nm_np1_fr_goo_37.avi
      
      ```
      import torchaudio
      
      src = "RATRACE_wave_f_nm_np1_fr_goo_37.avi"
      
      streamer = torchaudio.io.StreamReader(src=src)
      streamer.add_basic_video_stream(frames_per_chunk=-1)
      streamer.process_all_packets()
      video, = streamer.pop_chunks()
      
      print(video.size(0))  # prints 1, but there are more than 70 frames
      ```
      
      The reason why all the frames are not returned is due to invalid PTS values. All the frames's PTS values are `-9223372036854775808` so the internal mechanism discards them.
      
      The reason why the last frame is output is because when entering drain mode, the discard value of -1 is used, which is interpreted as no discard.
      
      For the second issue, the discard behavior should be consistent across regular decoding and drain mode.
      
      For the first issue, although the normal behavior is not guaranteed for such invalid input, we can support the case where one reads video from start (or when one seeks into t=0)
      
       ---
      
      This commits make the following changes to address the above two.
      
      1. Define the discard_before_pts attribtue on StreamProcessor, so that StreamProcessor is aware of the discard behavior without being told by StreamReader, and its behavior is consistent between regular decoding and drain.
      
         This gets rid of the discard_before_pts computation that is currently happening at the every time a frame is processed, so this should improve the peformance a bit.
      
      2. Change the meaning of discard_before_pts, so that when it's 0, no discard happens. With this change, the negative value is not necessary so we put it a UB status.
      
      Note:
         Even with those changes seeking videos with invalid PTS is not plausible, client codes can implement a fallback which decodes frames first and discard undesired ones.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2915
      
      Reviewed By: nateanl
      
      Differential Revision: D41957784
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2dafdbada5aa33bfc81c986306f80642ba6277df
      cbd35438
  2. 11 Dec, 2022 1 commit
  3. 10 Dec, 2022 2 commits
  4. 09 Dec, 2022 5 commits
    • Zhaoheng Ni's avatar
      Fix integration test for WAV2VEC2_ASR_LARGE_LV60K_10M (#2910) · 90162812
      Zhaoheng Ni authored
      Summary:
      After https://github.com/pytorch/audio/issues/2873, the pre-trained Wav2Vec2 models with larger datasets can get better performances. The PR fixes the integration test of bundle `WAV2VEC2_ASR_LARGE_LV60K_10M` which predicts the word `CURIOUSITY` to `CURIOUSSITY` before but now to `CURIOUSITY` correctly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2910
      
      Reviewed By: mthrok
      
      Differential Revision: D41881919
      
      Pulled By: nateanl
      
      fbshipit-source-id: 236fd00b983a5205c731f3efa31033a6b8257cab
      90162812
    • moto's avatar
      Update author and maintainer info (#2911) · eb8b1bda
      moto authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2911
      
      Reviewed By: carolineechen
      
      Differential Revision: D41887854
      
      Pulled By: mthrok
      
      fbshipit-source-id: eb91773ec67b4cda2d70733df450956d83742509
      eb8b1bda
    • Moto Hira's avatar
      Fix duplicated memory allocation in StreamWriter (#2906) · 90c456de
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2906
      
      The correct way to create AVFormatContext* for output is to pass an address of an uninitialized *AVFormatContext struct to `avformat_alloc_output_context2` function.
      
      The current code pre-allocates AVFormatContext* with `avformat_alloc_context`, then this allocated object is lost inside of `avformat_alloc_output_context2`.
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D41865685
      
      fbshipit-source-id: 9a9dc83b5acfe9b450f191fe716c85ebb5a5d842
      90c456de
    • Moto Hira's avatar
      Fix wrong frame allocation in StreamWriter (#2905) · 3518df48
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2905
      
      In StreamWriter, if the tensor format is different from the encoding format, then a FilterGraph object is automatically inserted to convert the format.
      
      The FilterGraph object operates on AVFrames. The input AVFrame must be allocated by us, but the output AVFrames is filled by FilterGraph, thus no need to allocate it.
      
      Now the output AVFrame is used as input to encoder regardless of whether FilterGraph was inserted. Thus the output AVFrame has to be manually allocated by us when FilterGraph is not used.
      
      The current code flips this condition and incorrectly allocates AVFrame when FilterGraph is present and does not allocate otherwise.
      
      This commit fix that.
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D41866198
      
      fbshipit-source-id: 40799c147dc8166a979ecfb58ed8e502539a6aed
      3518df48
    • atalman's avatar
      Toggle on/off ffmpeg test if needed (#2901) · ccda545c
      atalman authored
      Summary:
      Toggle on/off ffmpeg test if needed
      By default it ON, hence should not affect any current tests.
      To toggle ON no change required.
      To toggle OFF use:
      ```
      smoke_test.py --no-ffmpeg
      ```
      
      To be used when calling from builder currently. Since we do not install ffmpeg currently.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2901
      
      Reviewed By: carolineechen, mthrok
      
      Differential Revision: D41874976
      
      Pulled By: atalman
      
      fbshipit-source-id: c57b19f37c63a1f476f93a5211550e980e67d9c7
      ccda545c
  5. 08 Dec, 2022 4 commits
  6. 07 Dec, 2022 3 commits
  7. 06 Dec, 2022 1 commit
  8. 04 Dec, 2022 1 commit
  9. 02 Dec, 2022 1 commit
  10. 30 Nov, 2022 2 commits
  11. 29 Nov, 2022 5 commits
  12. 28 Nov, 2022 3 commits
  13. 19 Nov, 2022 1 commit
  14. 18 Nov, 2022 2 commits
  15. 17 Nov, 2022 4 commits
  16. 16 Nov, 2022 2 commits
  17. 15 Nov, 2022 2 commits
    • Grigory Sizov's avatar
      Add WavLM bundles (#2833) · 26f62dc5
      Grigory Sizov authored
      Summary:
      Closes T136364380, follow-up to https://github.com/pytorch/audio/issues/2822
      
      - Added "base", "base+", and "large" bundles for WavLM
      - Expanded `wav2vec2_pipeline_test.py` to include the new bundles
      - Added the new bundles to docs in `pipelines.rst`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2833
      
      Reviewed By: nateanl
      
      Differential Revision: D41194796
      
      Pulled By: sgrigory
      
      fbshipit-source-id: bf8e96c05b6a81ac5c5a014c46adeeac12685328
      26f62dc5
    • Grigory Sizov's avatar
      Use BetterTransfomer in WavLM Self-Attention (#2842) · 2d1da45c
      Grigory Sizov authored
      Summary:
      Closes T137506059
      
      Replaces functional multi-head attention in `WavLMSelfAttention` with a module `torch.nn.MultiheadAttention`. The reason is that the latter uses native CPU/CUDA implementation ([BetterTransfomer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/)) under certain conditions, and can achieve significant speedup. It also simplifies the code in `WavLMSelfAttention`
      
      Note: the definition of `bias` parameter in `WavLMSelfAttention.forward` has changed slightly, because in `torch.nn.MultiheadAttention` there is no parameter controlling presence of bias for projections of `k`, `v`, and `q` independently. In WavLM we only use `bias=True`, so it won't have any effect for users of WavLM or tests
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2842
      
      Reviewed By: nateanl
      
      Differential Revision: D41186166
      
      Pulled By: sgrigory
      
      fbshipit-source-id: e791c68106ad89f96c1abf046de699cb8ec7b595
      2d1da45c