"...linux/git@developer.sourcefind.cn:OpenDAS/torchaudio.git" did not exist on "95d9f2d272b2814010db7fc803a7d4dc6cf0c3b4"
  1. 10 Jul, 2023 1 commit
  2. 05 Jul, 2023 1 commit
  3. 21 Jun, 2023 1 commit
  4. 14 Jun, 2023 1 commit
  5. 09 Jun, 2023 1 commit
  6. 08 Jun, 2023 2 commits
    • Jeff Hwang's avatar
      Introduce chroma filter bank function (#3395) · dfd0c5fd
      Jeff Hwang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3395
      
      Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.
      
      Reviewed By: mthrok
      
      Differential Revision: D46307672
      
      fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5
      dfd0c5fd
    • moto's avatar
      Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca
      moto authored
      Summary:
      StreamReader decoding process is composed of the three steps;
      
      1. Decode the incoming AVPacket into AVFrame
      2. Pass AVFrame through AVFilter to perform post process
      3. Convert the resulgint AVFrame
      
      The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.
      
      For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
      However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405
      
      AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.
      
      Fix https://github.com/pytorch/audio/issues/3405
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3419
      
      Differential Revision: D46557505
      
      Pulled By: mthrok
      
      fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
      7dff24ca
  7. 07 Jun, 2023 1 commit
  8. 06 Jun, 2023 3 commits
  9. 02 Jun, 2023 1 commit
    • moto's avatar
      [BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5
      moto authored
      Summary:
      This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.
      
      Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.
      
      The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.
      
      Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.
      
      See some of the discussion https://github.com/pytorch/audio/issues/1269
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3368
      
      Differential Revision: D46406176
      
      Pulled By: mthrok
      
      fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
      5bbbb1d5
  10. 01 Jun, 2023 3 commits
  11. 30 May, 2023 1 commit
  12. 27 May, 2023 1 commit
    • moto's avatar
      Fix AudioEffector for mulaw (#3372) · af932cc7
      moto authored
      Summary:
      When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3372
      
      Reviewed By: hwangjeff
      
      Differential Revision: D46234772
      
      Pulled By: mthrok
      
      fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552
      af932cc7
  13. 26 May, 2023 3 commits
    • moto's avatar
      Fix encoding g722 format (#3373) · 1b05ca7e
      moto authored
      Summary:
      g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3373
      
      Reviewed By: hwangjeff
      
      Differential Revision: D46233181
      
      Pulled By: mthrok
      
      fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c
      1b05ca7e
    • Zhaoheng Ni's avatar
      Temporarily remove test for extract_features (#3378) · 05649ca3
      Zhaoheng Ni authored
      Summary:
      The tests failed for several bundles. Remove them and will re-add once the root cause is figured out.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3378
      
      Reviewed By: atalman
      
      Differential Revision: D46230884
      
      Pulled By: nateanl
      
      fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92
      05649ca3
    • Lakshmi Krishnan's avatar
      Improve RNN-T streaming decoding (#3295) · 9fc0dcaa
      Lakshmi Krishnan authored
      Summary:
      This commit fixes the following issues affecting streaming decoding quality
      1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
      2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step.  This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
      3. Some minor errors regarding shape checking for length.
      
      This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3295
      
      Reviewed By: nateanl
      
      Differential Revision: D46216113
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
      9fc0dcaa
  14. 24 May, 2023 1 commit
    • moto's avatar
      Update smoke test (#3346) · 71b2634b
      moto authored
      Summary:
      * Delay the import of torchaudio until the CLI options are parsed.
      * Add option to set log level to DEBUG so that it's easy to see the issue with external libraries.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3346
      
      Reviewed By: nateanl
      
      Differential Revision: D46022546
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9f988bbd770c2fd2bb260c3cfe02b238a9da2808
      71b2634b
  15. 23 May, 2023 3 commits
  16. 22 May, 2023 1 commit
  17. 20 May, 2023 1 commit
  18. 17 May, 2023 1 commit
    • moto's avatar
      Add 420p10le CPU support to StreamReader (#3332) · c12f4734
      moto authored
      Summary:
      This commit add support to decode YUV420P010LE format.
      
      The image tensor returned by this format
      - NCHW format (C == 3)
      - int16 type
      - value range [0, 2^10).
      
      Note that the value range is different from what "hevc_cuvid" decoder
      returns. "hevc_cuvid" decoder uses full range of int16 (internally,
      it's uint16) to express the color (with some intervals), but the values
      returned by CPU "hevc" decoder are with in [0, 2^10).
      
      Address https://github.com/pytorch/audio/issues/3331
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3332
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45925097
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
      c12f4734
  19. 10 May, 2023 2 commits
  20. 09 May, 2023 1 commit
  21. 05 May, 2023 1 commit
    • Xiaohui Zhang's avatar
      Add SpecAugment transform (#3309) · 82febc59
      Xiaohui Zhang authored
      Summary:
      (2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this.
      
      To solve these issues, here we
      [done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      [done in this PR] Introducing SpecAugment transform.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3309
      
      Reviewed By: nateanl
      
      Differential Revision: D45592926
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2
      82febc59
  22. 04 May, 2023 1 commit
    • Xiaohui Zhang's avatar
      Extend mask_along_axis{,_iid} (#3289) · 74bd971a
      Xiaohui Zhang authored
      Summary:
      (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design.
      
      To solve these issues, here we
      - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      
      The introduction of SpecAugment transform will be done in another PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3289
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45460357
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
      74bd971a
  23. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  24. 12 Apr, 2023 2 commits
    • moto's avatar
      Allow overwrite temp data in ffmpeg test (#3263) · cc7b8bd4
      moto authored
      Summary:
      When `TORCHAUDIO_TEST_TEMP_DIR` is set,
      all the unit test temporary data are stored in the  given directory.
      Running unit tests multiple times reuses the
      directory and the temporary files from the
      previous test runs are found there.
      
      FFmpeg save test writes reference data to the
      temporary directory, but it is not given the
      overwrite flag ("-y"), so it fails in such cases.
      
      This commit fixes that.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3263
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44859003
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b
      cc7b8bd4
    • moto's avatar
      Specify backend directly in test (#3262) · 563e409c
      moto authored
      Summary:
      Preparation to land https://github.com/pytorch/audio/pull/3241
      
      This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3262
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44897513
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32
      563e409c
  25. 05 Apr, 2023 1 commit
  26. 03 Apr, 2023 1 commit
  27. 01 Apr, 2023 1 commit
    • moto's avatar
      Add AudioEffector (#3163) · a4036248
      moto authored
      Summary:
      This commit adds a new feature AudioEffector, which can be used to
      apply various effects and codecs to waveforms in Tensor.
      
      Under the hood it uses StreamWriter and StreamReader to apply
      filters and encode/decode.
      
      This is going to replace the deprecated `apply_codec` and
      `apply_sox_effect_tensor` functions.
      
      It can also perform online, chunk-by-chunk filtering.
      
      Tutorial to follow.
      
      closes https://github.com/pytorch/audio/issues/3161
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3163
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44576660
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
      a4036248
  28. 30 Mar, 2023 2 commits
    • moto's avatar
      Support encode spec change in StreamWriter (#3207) · 1b648626
      moto authored
      Summary:
      This commit adds support for changing the spec of media
      (such as sample rate, #channels, image size and frame rate)
      on-the-fly at encoding time.
      
      The motivation behind this addition is that certain media
      formats support only limited number of spec, and it is
      cumbersome to require client code to change the spec
      every time.
      
      For example, OPUS supports only 48kHz sampling rate, and
      vorbis only supports stereo.
      
      To make it easy to work with media of different formats,
      this commit makes it so that anything that's not compatible
      with the format is automatically converted, and allows
      users to specify the override.
      
      Notable implementation detail is that, for sample format and
      pixel format, the default value of encoder has higher precedent
      to source value, while for other attributes like sample rate and
      #channels, the source value has higher precedent as long as
      they are supported.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3207
      
      Reviewed By: nateanl
      
      Differential Revision: D44439622
      
      Pulled By: mthrok
      
      fbshipit-source-id: 09524f201d485d201150481884a3e9e4d2aab081
      1b648626
    • moto's avatar
      Support changing the number of channels in StreamReader (#3216) · 4bc4ca75
      moto authored
      Summary:
      This commit adds `num_channels` argument,
      which allows one to change the number of channels on-the-fly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3216
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44516925
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3e5a11b3fdbb19071f712a8148e27aff60341df3
      4bc4ca75