- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402214 fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
-
- 13 May, 2022 1 commit
-
-
moto authored
Summary: This commit moves the Streaming API out of prototype module. * The related classes are renamed as following - `Streamer` -> `StreamReader`. - `SourceStream` -> `StreamReaderSourceStream` - `SourceAudioStream` -> `StreamReaderSourceAudioStream` - `SourceVideoStream` -> `StreamReaderSourceVideoStream` - `OutputStream` -> `StreamReaderOutputStream` This change is preemptive measurement for the possibility to add `StreamWriter` API. * Replace BUILD_FFMPEG build arg with USE_FFMPEG We are not building FFmpeg, so USE_FFMPEG is more appropriate --- After https://github.com/pytorch/audio/issues/2377 Remaining TODOs: (different PRs) - [ ] Introduce `is_ffmpeg_binding_available` function. - [ ] Refactor C++ code: - Rename `Streamer` to `StreamReader`. - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`. - Rename `prototype.cpp` to `stream_reader_binding.cpp`. - Introduce `stream_reader` directory. - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381) Pull Request resolved: https://github.com/pytorch/audio/pull/2378 Reviewed By: carolineechen Differential Revision: D36359299 Pulled By: mthrok fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
-
- 12 May, 2022 3 commits
-
-
moto authored
Summary: This commit updates the lazy module initialization logic for `torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`. - The modules are importable regarless of optional dependencies. i.e. `import torchaudio.prototype.io` does not trigger the check for optional dependencies. - Optional dependencies are checked when the actual API is imported for the first time. i.e. `from torchaudio.prototype.io import Streamer` triggers the check for optional dependencies. The downside is that; - `import torchaudio.prototype.io.Streamer` no longer works. ## Details: Starting from Python 3.7, modules can bave `__getattr__` function, which serves as a fallback if the import mechanism cannot find the attribute. This can be used to implement lazy import. ```python def __getattr__(name): global pi if name == 'pi': import math pi = math.pi return pi raise AttributeError(...) ``` Ref: https://twitter.com/raymondh/status/1094686528440168453 The implementation performs lazy import for the APIs that work with external/optional dependencies. In addition, it also check if the binding is initialized only once. ## Why is this preferable approach? Previously, the optional dependencies were checked at the tiem module is imported; https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5 As long as this module is in `prototype`, which we ask users to import explictly, users had control whether they want/do not want to install the optional dependencies. This approach only works for one optional dependencies per one module. Say, we add different I/O library as an optional dependency, we need to put all the APIs in dedicated submodule. This prevents us from having flat namespace. i.e. the I/O modules with multiple optional dependencies would look like ```python # Client code from torchaudio.io.foo import FooFeature from torchaudio.io.bar import BarFeature ``` where the new approach would allow ```python #client code from torchaudio.io import FooFeature, BarFeature ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2377 Reviewed By: nateanl Differential Revision: D36305603 Pulled By: mthrok fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae -
Zhaoheng Ni authored
Summary: - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform). This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so. - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/2296 Reviewed By: mthrok Differential Revision: D36323217 Pulled By: nateanl fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
-
John Reese authored
Summary: Applies the black-fbsource codemod with the new build of pyfmt. paintitblack Reviewed By: lisroach Differential Revision: D36324783 fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
-
- 11 May, 2022 1 commit
-
-
moto authored
Summary: On CircleCI, Windows unittests are failing for Python 3.7 with `PermissionError` at the end of test when it cleans up temporary directory. According to the discussion https://github.com/python/cpython/issues/74168, this is caused by a known issue with `shutil.rmtree`. In the above thread it is advised to simply ignore the error as it is not guaranteed that temp directories are cleaned up. This commit follows the same path and simply ignore the error so that our CI gets back to green. Pull Request resolved: https://github.com/pytorch/audio/pull/2379 Reviewed By: carolineechen Differential Revision: D36305595 Pulled By: mthrok fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5
-
- 10 May, 2022 5 commits
-
-
hwangjeff authored
Summary: Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241. Continuation of https://github.com/pytorch/audio/issues/2324. Pull Request resolved: https://github.com/pytorch/audio/pull/2358 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D36137992 Pulled By: hwangjeff fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2375 The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input. Pull Request resolved: https://github.com/pytorch/audio/pull/2376 Reviewed By: hwangjeff Differential Revision: D36280851 Pulled By: nateanl fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. The input arguments are: - multi-channel spectrum. - RTF vector of the target speech - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2368 Reviewed By: carolineechen Differential Revision: D36214940 Pulled By: nateanl fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
-
Zhaoheng Ni authored
Summary: When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency. This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`. Pull Request resolved: https://github.com/pytorch/audio/pull/2369 Reviewed By: carolineechen Differential Revision: D36204130 Pulled By: nateanl fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are: - multi-channel spectrum. - PSD matrix of target speech. - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2367 Reviewed By: hwangjeff Differential Revision: D36198015 Pulled By: nateanl fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
-
- 26 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
- 18 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py) modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline Pull Request resolved: https://github.com/pytorch/audio/pull/2290 Reviewed By: nateanl Differential Revision: D35692551 Pulled By: carolineechen fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c
-
- 12 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following: - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances. - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`. - Introduces tests for `conformer_rnnt_model`. - Adds docs. Pull Request resolved: https://github.com/pytorch/audio/pull/2322 Reviewed By: xiaohui-zhang Differential Revision: D35565987 Pulled By: hwangjeff fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
-
- 08 Apr, 2022 1 commit
-
-
moto authored
Summary: Add badges of supported properties and devices to functionals and transforms. This commit adds `.. devices::` and `.. properties::` directives to sphinx. APIs with these directives will have badges (based off of shields.io) which link to the page with description of these features. Continuation of https://github.com/pytorch/audio/issues/2316 Excluded dtypes for further improvement, and actually added badges to most of functional/transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2321 Reviewed By: hwangjeff Differential Revision: D35489063 Pulled By: mthrok fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
-
- 01 Apr, 2022 1 commit
-
-
moto authored
Summary: The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows. https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272 ``` > self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol) E AssertionError: Tensor-likes are not close! E E Mismatched elements: 28 / 196608 (0.0%) E Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed) E Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed) ``` The value of atol==1e-08 seems very strict but all the other batch consistency tests are passing. The violation is for very small number of samples, which looks suspicious, but I think it is okay to reduce it to `1e-06` for Windows. `1e-06` is still more strict than the majority of the comparison tests we have. Pull Request resolved: https://github.com/pytorch/audio/pull/2305 Reviewed By: hwangjeff Differential Revision: D35298056 Pulled By: mthrok fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
-
- 31 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit update `get_sinusoid` function in test utility so that when a multi channel is requested, non-primal channel have randomized initial phase. This adds some variety in test data which should not break the tests. Currently `get_sinusoid` returns identical waveforms for all the channels. This multi channel support was added just to mock the input data so that it is easy to test features with multi-channel inputs, so tests should not be expecting the all channels to be identical. When working on numerical parity, it is more useful if the raw waveforms are somewhat different. Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel <img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2301 Reviewed By: hwangjeff Differential Revision: D35291689 Pulled By: mthrok fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd
-
moto authored
Summary: Tests on `torchaudio.compliance.kaldi` were scattered at different places. This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/` directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2303 Reviewed By: nateanl Differential Revision: D35288400 Pulled By: mthrok fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46
-
- 04 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit adds the following behavior to `seek` so that `seek` works after a frame is decoded. 1. Flush the decoder buffer. 2. Recreate filter graphs (so that internal state is re-initialized) 3. Discard the buffered tensor. (decoded chunks) Also it disallows negative values for seek timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2264 Reviewed By: carolineechen Differential Revision: D34497826 Pulled By: mthrok fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
-
moto authored
Summary: `torchaudio.prototype.io.Streamer` class takes context dependant options as `option` argument in the form of mappings of strings. Currently there is no check if the provided options were valid for the given input. This commit adds the check and raise an error if an invalid erro is given. This is analogous to `ffmpeg` command error handling. ``` $ ffmpeg -foo ... Unrecognized option 'foo'. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2263 Reviewed By: hwangjeff Differential Revision: D34495111 Pulled By: mthrok fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
-
- 26 Feb, 2022 2 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2261 Enables prototype ffmpeg io tests in fbcode. Reviewed By: nateanl Differential Revision: D33698353 fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036
-
Zhaoheng Ni authored
Summary: This PR adds ``apply_beamforming`` method to ``torchaudio.functional``. The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum. Pull Request resolved: https://github.com/pytorch/audio/pull/2232 Reviewed By: mthrok Differential Revision: D34474561 Pulled By: nateanl fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
-
- 25 Feb, 2022 5 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``rtf_power`` method to ``torchaudio.functional``. The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206). [This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English. The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2231 Reviewed By: mthrok Differential Revision: D34474503 Pulled By: nateanl fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb
-
Zhaoheng Ni authored
Summary: This PR adds `rtf_evd` method to `torchaudio.functional`. The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. The input argument is the power spectral density (PSD) matrix of the target speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2230 Reviewed By: mthrok Differential Revision: D34474188 Pulled By: nateanl fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference. The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2229 Reviewed By: mthrok Differential Revision: D34474119 Pulled By: nateanl fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2228 Reviewed By: mthrok Differential Revision: D34474018 Pulled By: nateanl fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb
-
Zhaoheng Ni authored
Summary: This PR adds ``psd`` method to ``torchaudio.functional``. It computes the power spectral density (PSD) matrix of the complex-valued spectrum. The method also supports normalization of Time-Frequency mask. Pull Request resolved: https://github.com/pytorch/audio/pull/2227 Reviewed By: mthrok Differential Revision: D34473908 Pulled By: nateanl fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9
-
- 24 Feb, 2022 1 commit
-
-
Andrey Talman authored
Summary: Adding py3.10 to audio Pull Request resolved: https://github.com/pytorch/audio/pull/2224 Reviewed By: malfet, atalman, mthrok Differential Revision: D34442377 Pulled By: seemethere fbshipit-source-id: 2656de73427063958d609a74c01b526a476cb06a
-
- 17 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests. This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2245 Reviewed By: mthrok Differential Revision: D34273035 Pulled By: nateanl fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056
-
Zhaoheng Ni authored
Summary: - Refactor the current `LibriSpeechRNNTModule`'s unit test. - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule` - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test. Pull Request resolved: https://github.com/pytorch/audio/pull/2240 Reviewed By: mthrok Differential Revision: D34285195 Pulled By: nateanl fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
-
- 16 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`. Pull Request resolved: https://github.com/pytorch/audio/pull/2246 Reviewed By: carolineechen Differential Revision: D34273057 Pulled By: nateanl fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b
-
Zhaoheng Ni authored
Summary: In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2244 Reviewed By: mthrok Differential Revision: D34272998 Pulled By: nateanl fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
-
- 15 Feb, 2022 1 commit
-
-
hwangjeff authored
Summary: Orders and names Conformer's initializer args to be more consistent with Emformer's. Pull Request resolved: https://github.com/pytorch/audio/pull/2223 Reviewed By: mthrok Differential Revision: D34226177 Pulled By: hwangjeff fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829
-
- 11 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Adds fixed random seed to Emformer RNN-T training recipe test. Pull Request resolved: https://github.com/pytorch/audio/pull/2220 Reviewed By: nateanl Differential Revision: D34180644 Pulled By: hwangjeff fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9
-
hwangjeff authored
Summary: Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/2216 Reviewed By: nateanl Differential Revision: D34171480 Pulled By: hwangjeff fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e
-
- 09 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases. - Clarify expected input shapes in API documentation. - Adjust `infer` tests to reflect expected usage. - Add assertion for input shape for `infer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2207 Reviewed By: mthrok Differential Revision: D34101205 Pulled By: hwangjeff fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
-
hwangjeff authored
Summary: Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2208 Reviewed By: mthrok Differential Revision: D34099793 Pulled By: hwangjeff fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
-
- 02 Feb, 2022 1 commit
-
-
moto authored
Summary: This PR adds the prototype streaming API. The implementation is based on ffmpeg libraries. For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2164 Reviewed By: hwangjeff Differential Revision: D33934457 Pulled By: mthrok fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe
-
- 01 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2187 Reviewed By: nateanl, mthrok Differential Revision: D33918092 Pulled By: hwangjeff fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
-
Caroline Chen authored
Summary: add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens Pull Request resolved: https://github.com/pytorch/audio/pull/2184 Reviewed By: mthrok Differential Revision: D33905530 Pulled By: carolineechen fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
-