1. 02 May, 2023 1 commit
  2. 01 May, 2023 2 commits
  3. 29 Apr, 2023 1 commit
  4. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  5. 25 Apr, 2023 1 commit
  6. 19 Apr, 2023 2 commits
  7. 18 Apr, 2023 1 commit
  8. 12 Apr, 2023 3 commits
  9. 11 Apr, 2023 2 commits
  10. 10 Apr, 2023 4 commits
  11. 07 Apr, 2023 5 commits
  12. 06 Apr, 2023 2 commits
    • moto's avatar
      Remove custom flashlight import (#3246) · ae614ed3
      moto authored
      Summary:
      In https://github.com/pytorch/audio/pull/3232, the CTC decoder is excluded from binary distribution.
      To use CTCDecoder, users need to install flashlight-text.
      
      Currently, if flashlight-text is not available, torchaudio still attempts to import the custom bundle.
      This commit clean up this behavior by delaying the error until one of the components is actually used,
      and providing a better message.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3246
      
      Test Plan:
      Binary smoke tests import torchaudio without installing flashlight.
      Unit test CI jobs run the CTC decoder with flashlight installed.
      
      Reviewed By: jacobkahn
      
      Differential Revision: D44748413
      
      Pulled By: mthrok
      
      fbshipit-source-id: 21d2cbd9961ed88405a739cc682071066712f5e4
      ae614ed3
    • Jeff Hwang's avatar
      Add frame writing API to StreamWriter (#3244) · f4d94cab
      Jeff Hwang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3244
      
      Adds methods to `StreamWriter` that allow for passing in `AVFrame` instances rather than tensors.
      
      Reviewed By: mthrok
      
      Differential Revision: D44589256
      
      fbshipit-source-id: f100e0d349708482b873a9a4bae1eaf5eb65301a
      f4d94cab
  13. 05 Apr, 2023 2 commits
  14. 04 Apr, 2023 4 commits
    • moto's avatar
      [BC-breaking] Make I/O optional arguments kw-only (#3227) · ab40a3a3
      moto authored
      Summary:
      Recently, we added bunch of options to make StreamReader/Writer flexible. As a result, their methods have many number of arguments, and some of them have semantic grouping.
      
      For example, the arguments of ``StreamWriter.add_video_stream`` are roughly grouped as follow;
      
      - Information about input media format
         `frame_rate`, `width`, `height`, `format`
      - Information about encoder
         `encoder`, `encoder_option`
      - Information about codec configuration
         `codec_config`
      - Information about encode media format
         `encoder_format`, `encoder_frame_rate`, `encoder_width`, `encoder_height`
      - Information about additional processing
         `filter_desc`
      - Hardware acceleration
         `hw_accel`
      
      We do not know what arguments will be added in the future, but when we do,
      we want to keep them roughly grouped, by inserting the new argument
      somewhere in a middle without breaking backward compatibility.
      
      This commit puts most of them in keyword-only argument, so that we can
      rearrange them without breaking backward compatibility.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3227
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44681620
      
      Pulled By: mthrok
      
      fbshipit-source-id: b55f6168f4c2f3d0f59731b9bb0db4ae54e5a90f
      ab40a3a3
    • moto's avatar
      Disable CTC decoder bundle by default (#3232) · 3844a2bd
      moto authored
      Summary:
      As we migrate to use upstream flashlight-text and KenLM, this PR disable building CTC decoder by default.
      This will stop shipping flashlight-text and KenLM bundle in torchaudio binary.
      
      Ref: https://github.com/pytorch/audio/issues/3088
      
      cc jacobkahn
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3232
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44650872
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2415623abaf3cafa181135db5112d3c711137cd7
      3844a2bd
    • hwangjeff's avatar
      Swap in assertions for decoder setup checks (#3235) · ea212c6e
      hwangjeff authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3235
      
      Reviewed By: mthrok
      
      Differential Revision: D44653654
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: f28a6068e826581d76ed4a216adb6019b6486e53
      ea212c6e
    • moto's avatar
      Remove linux GPU unit test from CircleCI (#3231) · 0d57a3af
      moto authored
      Summary:
      Linux GPU unit test on CircleCI relies on custom Docker image with CUDA 10.2.
      
      PyTorch 2.0 does not support CUDA 10, so these tests have not run for a while.
      
      We have GPU tests on GHA for Linux, so we can get rid of them.
      
      Windows GPU tests are not ported to GHA yet, but they are still working on CircleCI, so we don't delete them yet.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3231
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44639302
      
      Pulled By: mthrok
      
      fbshipit-source-id: c1fd39f4805a50a12af4259d423985fe453fd229
      0d57a3af
  15. 03 Apr, 2023 4 commits
  16. 01 Apr, 2023 1 commit
    • moto's avatar
      Add AudioEffector (#3163) · a4036248
      moto authored
      Summary:
      This commit adds a new feature AudioEffector, which can be used to
      apply various effects and codecs to waveforms in Tensor.
      
      Under the hood it uses StreamWriter and StreamReader to apply
      filters and encode/decode.
      
      This is going to replace the deprecated `apply_codec` and
      `apply_sox_effect_tensor` functions.
      
      It can also perform online, chunk-by-chunk filtering.
      
      Tutorial to follow.
      
      closes https://github.com/pytorch/audio/issues/3161
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3163
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44576660
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
      a4036248
  17. 31 Mar, 2023 3 commits
  18. 30 Mar, 2023 1 commit
    • moto's avatar
      Support encode spec change in StreamWriter (#3207) · 1b648626
      moto authored
      Summary:
      This commit adds support for changing the spec of media
      (such as sample rate, #channels, image size and frame rate)
      on-the-fly at encoding time.
      
      The motivation behind this addition is that certain media
      formats support only limited number of spec, and it is
      cumbersome to require client code to change the spec
      every time.
      
      For example, OPUS supports only 48kHz sampling rate, and
      vorbis only supports stereo.
      
      To make it easy to work with media of different formats,
      this commit makes it so that anything that's not compatible
      with the format is automatically converted, and allows
      users to specify the override.
      
      Notable implementation detail is that, for sample format and
      pixel format, the default value of encoder has higher precedent
      to source value, while for other attributes like sample rate and
      #channels, the source value has higher precedent as long as
      they are supported.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3207
      
      Reviewed By: nateanl
      
      Differential Revision: D44439622
      
      Pulled By: mthrok
      
      fbshipit-source-id: 09524f201d485d201150481884a3e9e4d2aab081
      1b648626