"...linux/git@developer.sourcefind.cn:OpenDAS/torchaudio.git" did not exist on "95d9f2d272b2814010db7fc803a7d4dc6cf0c3b4"
- 10 Jul, 2023 1 commit
-
-
moto authored
Summary: 1. Update smoke test script to change directory so that there is no `torchaudio` directory in CWD when smoke test is being executed. 2. Disable the part of smoke test which requires FFmpeg for wheel. The preparation for https://github.com/pytorch/test-infra/pull/4358 Pull Request resolved: https://github.com/pytorch/audio/pull/3465 Reviewed By: nateanl Differential Revision: D47345117 Pulled By: mthrok fbshipit-source-id: 95aad0a22922d44ee9a24a05d9ece85166b8c17e
-
- 05 Jul, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3433 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: mthrok Differential Revision: D46657526 fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89
-
- 21 Jun, 2023 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3427 Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms. Reviewed By: mthrok Differential Revision: D46547418 fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960
-
- 14 Jun, 2023 1 commit
-
-
moto authored
Summary: Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate. Pull Request resolved: https://github.com/pytorch/audio/pull/3374 Differential Revision: D46235358 Pulled By: mthrok fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4
-
- 09 Jun, 2023 1 commit
-
-
moto authored
Summary: The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test. See https://github.com/pytorch/audio/issues/3430 Pull Request resolved: https://github.com/pytorch/audio/pull/3431 Differential Revision: D46592883 Pulled By: mthrok fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6
-
- 08 Jun, 2023 2 commits
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3395 Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`. Reviewed By: mthrok Differential Revision: D46307672 fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5
-
moto authored
Summary: StreamReader decoding process is composed of the three steps; 1. Decode the incoming AVPacket into AVFrame 2. Pass AVFrame through AVFilter to perform post process 3. Convert the resulgint AVFrame The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved. For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable. However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405 AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape. Fix https://github.com/pytorch/audio/issues/3405 Pull Request resolved: https://github.com/pytorch/audio/pull/3419 Differential Revision: D46557505 Pulled By: mthrok fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
-
- 07 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415 Differential Revision: D46526437 Pulled By: mthrok fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b
-
- 06 Jun, 2023 3 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410 Differential Revision: D46496786 Pulled By: mthrok fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059
-
Moto Hira authored
Differential Revision: D46126226 Original commit changeset: 42cb52b19d91 Original Phabricator Diff: D46126226 fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3365 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: vineelpratap Differential Revision: D46126226 fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2
-
- 02 Jun, 2023 1 commit
-
-
moto authored
Summary: This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion https://github.com/pytorch/audio/issues/1269 Pull Request resolved: https://github.com/pytorch/audio/pull/3368 Differential Revision: D46406176 Pulled By: mthrok fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
-
- 01 Jun, 2023 3 commits
-
-
moto authored
Summary: This commit removes file-like obejct support so that we can remove custom patch The motivation and plan is outlined in https://github.com/pytorch/audio/issues/2950. Pull Request resolved: https://github.com/pytorch/audio/pull/3035 Reviewed By: hwangjeff Differential Revision: D44695647 Pulled By: mthrok fbshipit-source-id: 13af0234e288c041bc7b490e1f967f85ce7eb8ec
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3398 Reviewed By: nateanl Differential Revision: D46354862 Pulled By: mthrok fbshipit-source-id: b86dcdfeff8ed9db87b0b78eca20f6f18117e97e
-
moto authored
Summary: The arguments of TorchAudio's save function ("format", "bits_per_sample" and "encoding") are not one-to-one mapping to the arguments of FFmpeg encoding. For example, to use vorbis codec, FFmpeg expects "ogg" container/extension with "vorbis" encoder. It does not recognize "vorbis" extension like TorchAudio (libsox) does. This commit refactors the logic to parse/map the arguments. As a result it now properly works with vorbis and mp3 extension. Pull Request resolved: https://github.com/pytorch/audio/pull/3387 Reviewed By: hwangjeff Differential Revision: D46328787 Pulled By: mthrok fbshipit-source-id: 36f993952a062bfec58a8b51be6aa86297571f90
-
- 30 May, 2023 1 commit
-
-
atalman authored
Summary: Disable failing GPU unit test. See associated issue: https://github.com/pytorch/audio/issues/3376 Pull Request resolved: https://github.com/pytorch/audio/pull/3384 Reviewed By: mthrok Differential Revision: D46279324 Pulled By: atalman fbshipit-source-id: 3a606bb992e0261451f48d1fb458e054f7fd5583
-
- 27 May, 2023 1 commit
-
-
moto authored
Summary: When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform. Pull Request resolved: https://github.com/pytorch/audio/pull/3372 Reviewed By: hwangjeff Differential Revision: D46234772 Pulled By: mthrok fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552
-
- 26 May, 2023 3 commits
-
-
moto authored
Summary: g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up. Pull Request resolved: https://github.com/pytorch/audio/pull/3373 Reviewed By: hwangjeff Differential Revision: D46233181 Pulled By: mthrok fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c
-
Zhaoheng Ni authored
Summary: The tests failed for several bundles. Remove them and will re-add once the root cause is figured out. Pull Request resolved: https://github.com/pytorch/audio/pull/3378 Reviewed By: atalman Differential Revision: D46230884 Pulled By: nateanl fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92
-
Lakshmi Krishnan authored
Summary: This commit fixes the following issues affecting streaming decoding quality 1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided. 2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies. 3. Some minor errors regarding shape checking for length. This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript. Pull Request resolved: https://github.com/pytorch/audio/pull/3295 Reviewed By: nateanl Differential Revision: D46216113 Pulled By: hwangjeff fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
-
- 24 May, 2023 1 commit
-
-
moto authored
Summary: * Delay the import of torchaudio until the CLI options are parsed. * Add option to set log level to DEBUG so that it's easy to see the issue with external libraries. Pull Request resolved: https://github.com/pytorch/audio/pull/3346 Reviewed By: nateanl Differential Revision: D46022546 Pulled By: mthrok fbshipit-source-id: 9f988bbd770c2fd2bb260c3cfe02b238a9da2808
-
- 23 May, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: resolve https://github.com/pytorch/audio/issues/3347 `position_bias` is ignored in `extract_features` method, this doesn't affect Wav2Vec2 or HuBERT models, but it changes the output of transformer layers (except the first layer) in WavLM model. This PR fixes it by adding `position_bias` to the method. Pull Request resolved: https://github.com/pytorch/audio/pull/3350 Reviewed By: mthrok Differential Revision: D46112148 Pulled By: nateanl fbshipit-source-id: 3d21aa4b32b22da437b440097fd9b00238152596
-
Omkar Salpekar authored
Summary: As discussed in the [Torchaudio Migration Proposal](https://docs.google.com/document/d/1PF8biwiGzsjzfEBM78mlLiRrkcsGsvuYkeqkI66Ym8A/edit), this PR moves MacOS unittest job to Nova tooling. Note that this does not touch anything within the existing CircleCI job at the moment. Passing job: https://github.com/pytorch/audio/actions/runs/4932497525/jobs/8815581251?pr=3324 Pull Request resolved: https://github.com/pytorch/audio/pull/3324 Reviewed By: atalman, mthrok Differential Revision: D46113524 Pulled By: osalpekar fbshipit-source-id: d048d300489f992fa187628cb6744d95ab4fb68a
-
Zhaoheng Ni authored
Summary: Fix https://github.com/pytorch/audio/issues/3361 When adding FunctionalCUDAOnlyTest, the class should inherit from `TestBaseMixin` instead of `Functional` Pull Request resolved: https://github.com/pytorch/audio/pull/3363 Reviewed By: atalman, osalpekar Differential Revision: D46112084 Pulled By: nateanl fbshipit-source-id: 67c6472fda98cb718e0fc53ab248beda745feab5
-
- 22 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3354 when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0. Reviewed By: xiaohui-zhang Differential Revision: D46059971 fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8
-
- 20 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3348 The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame. Reviewed By: vineelpratap, xiaohui-zhang Differential Revision: D45867265 fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb
-
- 17 May, 2023 1 commit
-
-
moto authored
Summary: This commit add support to decode YUV420P010LE format. The image tensor returned by this format - NCHW format (C == 3) - int16 type - value range [0, 2^10). Note that the value range is different from what "hevc_cuvid" decoder returns. "hevc_cuvid" decoder uses full range of int16 (internally, it's uint16) to express the color (with some intervals), but the values returned by CPU "hevc" decoder are with in [0, 2^10). Address https://github.com/pytorch/audio/issues/3331 Pull Request resolved: https://github.com/pytorch/audio/pull/3332 Reviewed By: hwangjeff Differential Revision: D45925097 Pulled By: mthrok fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
-
- 10 May, 2023 2 commits
-
-
moto authored
Summary: This commit makes the code defaults to the backend dispatcher by default. Enabling backend dispatcher puts the FFmpeg-based I/O implementation on higher priority (if the corresponding FFmpeg is available), and allows individual function call to specify the backend. See also https://github.com/pytorch/audio/issues/2950 Pull Request resolved: https://github.com/pytorch/audio/pull/3241 Reviewed By: hwangjeff Differential Revision: D44709068 Pulled By: mthrok fbshipit-source-id: 43aac3433f78a681df6669e9ac46e8ecf3beb1be
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2643 - replace `SGD` optimization with `torch.linalg.lstsq` which is much faster. - Add autograd test for `InverseMelScale` - update other tests Pull Request resolved: https://github.com/pytorch/audio/pull/3280 Reviewed By: hwangjeff Differential Revision: D45679988 Pulled By: nateanl fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59
-
- 09 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`. Pull Request resolved: https://github.com/pytorch/audio/pull/3322 Reviewed By: mthrok Differential Revision: D45691769 Pulled By: nateanl fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045
-
- 05 May, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: (2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this. To solve these issues, here we [done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. [done in this PR] Introducing SpecAugment transform. Pull Request resolved: https://github.com/pytorch/audio/pull/3309 Reviewed By: nateanl Differential Revision: D45592926 Pulled By: xiaohui-zhang fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2
-
- 04 May, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. To solve these issues, here we - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. The introduction of SpecAugment transform will be done in another PR. Pull Request resolved: https://github.com/pytorch/audio/pull/3289 Reviewed By: hwangjeff Differential Revision: D45460357 Pulled By: xiaohui-zhang fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 12 Apr, 2023 2 commits
-
-
moto authored
Summary: When `TORCHAUDIO_TEST_TEMP_DIR` is set, all the unit test temporary data are stored in the given directory. Running unit tests multiple times reuses the directory and the temporary files from the previous test runs are found there. FFmpeg save test writes reference data to the temporary directory, but it is not given the overwrite flag ("-y"), so it fails in such cases. This commit fixes that. Pull Request resolved: https://github.com/pytorch/audio/pull/3263 Reviewed By: hwangjeff Differential Revision: D44859003 Pulled By: mthrok fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b -
moto authored
Summary: Preparation to land https://github.com/pytorch/audio/pull/3241 This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled. Pull Request resolved: https://github.com/pytorch/audio/pull/3262 Reviewed By: hwangjeff Differential Revision: D44897513 Pulled By: mthrok fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32
-
- 05 Apr, 2023 1 commit
-
-
moto authored
Summary: In dispatcher mode, FFmpeg backend does not handle file-like object, and C++ implementation raises an issue. This commit fixes it by normalizing file-like object to string. Pull Request resolved: https://github.com/pytorch/audio/pull/3243 Reviewed By: nateanl Differential Revision: D44719280 Pulled By: mthrok fbshipit-source-id: 9dae459e2a5fb4992b4ef53fe4829fe8c35b2edd
-
- 03 Apr, 2023 1 commit
-
-
moto authored
Summary: Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( https://github.com/pybind/pybind11/discussions/4013 and https://github.com/pybind/pybind11/pull/2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address https://github.com/pytorch/audio/issues/3218 Pull Request resolved: https://github.com/pytorch/audio/pull/3230 Reviewed By: hwangjeff Differential Revision: D44642989 Pulled By: mthrok fbshipit-source-id: a90af828c7c576bc0eb505164327365ebaadc471
-
- 01 Apr, 2023 1 commit
-
-
moto authored
Summary: This commit adds a new feature AudioEffector, which can be used to apply various effects and codecs to waveforms in Tensor. Under the hood it uses StreamWriter and StreamReader to apply filters and encode/decode. This is going to replace the deprecated `apply_codec` and `apply_sox_effect_tensor` functions. It can also perform online, chunk-by-chunk filtering. Tutorial to follow. closes https://github.com/pytorch/audio/issues/3161 Pull Request resolved: https://github.com/pytorch/audio/pull/3163 Reviewed By: hwangjeff Differential Revision: D44576660 Pulled By: mthrok fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
-
- 30 Mar, 2023 2 commits
-
-
moto authored
Summary: This commit adds support for changing the spec of media (such as sample rate, #channels, image size and frame rate) on-the-fly at encoding time. The motivation behind this addition is that certain media formats support only limited number of spec, and it is cumbersome to require client code to change the spec every time. For example, OPUS supports only 48kHz sampling rate, and vorbis only supports stereo. To make it easy to work with media of different formats, this commit makes it so that anything that's not compatible with the format is automatically converted, and allows users to specify the override. Notable implementation detail is that, for sample format and pixel format, the default value of encoder has higher precedent to source value, while for other attributes like sample rate and #channels, the source value has higher precedent as long as they are supported. Pull Request resolved: https://github.com/pytorch/audio/pull/3207 Reviewed By: nateanl Differential Revision: D44439622 Pulled By: mthrok fbshipit-source-id: 09524f201d485d201150481884a3e9e4d2aab081
-
moto authored
Summary: This commit adds `num_channels` argument, which allows one to change the number of channels on-the-fly. Pull Request resolved: https://github.com/pytorch/audio/pull/3216 Reviewed By: hwangjeff Differential Revision: D44516925 Pulled By: mthrok fbshipit-source-id: 3e5a11b3fdbb19071f712a8148e27aff60341df3
-