- 23 Dec, 2021 2 commits
-
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
hwangjeff authored
Summary: Adds implementation of Conformer module. Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770. Pull Request resolved: https://github.com/pytorch/audio/pull/2068 Reviewed By: mthrok Differential Revision: D33236957 Pulled By: hwangjeff fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6
-
- 21 Dec, 2021 1 commit
-
-
moto authored
Summary: ## bug description When a 24 bits-par-sample audio is loaded via file-like object, the loaded Tensor is wrong. It was fine if the audio is loaded from local file. ## The cause of the bug The core of the sox's decoding mechanism is `sox_read` function, one of which parameter is the maximum number of samples to decode from the given buffer. https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)] The `sox_read` function is called in what is called `drain` effect, callback and this callback receives output buffer and its size in byte. The previous implementation passed this size value as the argument of `sox_read` for the maximum number of samples to read. Since buffer size is larger than the number of samples fit in the buffer, `sox_read` function always consumed the entire buffer. (This behavior is not wrong except when the input is 24 bits-per-sample and file-like object.) When the input is read from file-like object, inside of drain callback, new data are fetched via Python's `read` method and loaded on fixed-size memory region. The size of this memory region can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`, but the default value is 8096. If the input format is 24 bits-per-sample, the end of memory region does not necessarily correspond to the end of a valid sample. When `sox_read` consumes all the data in the buffer region, the data at the end introduces some unexpected values. This causes the aforementioned bug ## Fix Pass proper (better estimated) maximum number of samples decodable to `sox_read`. Pull Request resolved: https://github.com/pytorch/audio/pull/2084 Reviewed By: carolineechen Differential Revision: D33236947 Pulled By: mthrok fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
-
- 30 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time. For one of the four tests: Before: ``` test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%] ======================== 1 passed in 517.35s (0:08:37) ========================= ``` After: ``` test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%] ======================== 1 passed in 104.59s (0:01:44) ========================= ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2037 Reviewed By: mthrok Differential Revision: D32726213 Pulled By: hwangjeff fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e
-
- 24 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference. Pull Request resolved: https://github.com/pytorch/audio/pull/2028 Reviewed By: mthrok Differential Revision: D32627919 Pulled By: hwangjeff fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
-
- 23 Nov, 2021 1 commit
-
-
moto authored
Summary: The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test. Pull Request resolved: https://github.com/pytorch/audio/pull/2025 Reviewed By: nateanl Differential Revision: D32615933 Pulled By: mthrok fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f
-
- 22 Nov, 2021 2 commits
-
-
Zhaoheng Ni authored
Summary: Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram. Pull Request resolved: https://github.com/pytorch/audio/pull/2024 Reviewed By: carolineechen Differential Revision: D32594051 Pulled By: nateanl fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46
-
Zhaoheng Ni authored
Summary: Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check. Pull Request resolved: https://github.com/pytorch/audio/pull/2004 Reviewed By: mthrok Differential Revision: D32539827 Pulled By: nateanl fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22
-
- 18 Nov, 2021 2 commits
-
-
hwangjeff authored
Summary: Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model. Pull Request resolved: https://github.com/pytorch/audio/pull/2003 Reviewed By: nateanl Differential Revision: D32440879 Pulled By: hwangjeff fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360
-
Facebook Community Bot authored
Co-authored-by:Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
-
- 17 Nov, 2021 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2015 as titled Reviewed By: hwangjeff, mthrok Differential Revision: D32495691 fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a
-
- 04 Nov, 2021 2 commits
-
-
Caroline Chen authored
-
moto authored
This commit changes all the `torch.hub` network utility functions to be imported from `torchaudio._internal`, so that later we can replace the function within fbcode.
-
- 03 Nov, 2021 3 commits
-
-
moto authored
Following the plan #1337, this commit drops the support for pseudo complex type from `F.phase_vocoder` and `T.TimeStretch`.
-
moto authored
Following the plan #1337, this commit drops the support for pseudo complex type from `F.spectrogram` and `T.Spectrogram`. It also deprecates the use of `return_complex` argument.
-
moto authored
-
- 02 Nov, 2021 3 commits
- 28 Oct, 2021 1 commit
-
-
S Harish authored
-
- 27 Oct, 2021 1 commit
-
-
moto authored
-
- 25 Oct, 2021 1 commit
-
-
moto authored
-
- 22 Oct, 2021 1 commit
-
-
moto authored
- Make the test support other languages - Fetch tetst asset on-the-fly
-
- 21 Oct, 2021 1 commit
-
-
moto authored
* [BC-breaking] Remove unused dimension from pretrained Wav2Vec2 ASR The Wav2Vec2 ASR pretrained weights originated from fairseq have extra dimension that have nothing to do with the ASR task. https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/data/dictionary.py#L18-L37 which is masked during the loss computation as https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/criterions/ctc.py#L126-L128 This change removes it. * Use '-' for blank token representation.
-
- 15 Oct, 2021 2 commits
-
-
moto authored
Future work items: - length computation of GriffinLim - better way to make InverseMelScale work in inference_mode
-
moto authored
- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872. - Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and `Wav2Vec2ASRBundle` (for models fine-tuned for ASR). - Update base URL
-
- 13 Oct, 2021 2 commits
-
-
Caroline Chen authored
-
moto authored
-
- 10 Oct, 2021 1 commit
-
-
moto authored
Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
-
- 08 Oct, 2021 2 commits
- 07 Oct, 2021 2 commits
-
-
moto authored
This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10. The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated) 1. `Wav2Vec2Model.extract_features` In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block. 2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)` - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning. - Added dropout parameters. -
moto authored
This commit makes the following changes 1. Make the factory function with full customizability public. i.e. `_get_model(...) -> wav2vec2_model(...)`. 2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout). i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)` ### Why? While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework, 1. Model implementation is responsible for computation logic, 2. factory functions are responsible for customizability and model construction, 3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels). (note: for simple models, combining 1 and 2 is also okay.) This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange. This PR rectifies it by making the mother factory function public. This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.
-
- 06 Oct, 2021 5 commits
-
-
kingyiusuen authored
-
moto authored
Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models - Wav2Vec 2.0 Base / Large / Large (LV-60) - XLSR-53
-
hwangjeff authored
Adds an implementation of Emformer, a memory-efficient transformer architecture introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency streaming speech recognition applications.
-
moto authored
This commit adds - HUBERT_LARGE - HUBERT_XLARGE - HUBERT_ASR_XLARGE
-
moto authored
-
- 05 Oct, 2021 2 commits
-
-
moto authored
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/1817 This changes the imports in the `torchaudio` to include the new import locations. ``` codemod -d pytorch/audio --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Reviewed By: mthrok Differential Revision: D31302450 fbshipit-source-id: f31a0d4f453f840ea690edb688555a9d585787b5 Co-authored-by:
Zafar Takhirov <zaf@fb.com>
-