"projects/vscode:/vscode.git/clone" did not exist on "335d439385bbb968f6cdf1c4b7da5d3c6959d7a5"
- 01 Jun, 2022 1 commit
-
-
Caroline Chen authored
Summary: Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module. hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well?? Pull Request resolved: https://github.com/pytorch/audio/pull/2410 Reviewed By: mthrok Differential Revision: D36784521 Pulled By: carolineechen fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
-
- 31 May, 2022 1 commit
-
-
moto authored
Summary: Extracted from https://github.com/pytorch/audio/issues/2419. Move the failure of sox_io from C++ to Python layer. Pull Request resolved: https://github.com/pytorch/audio/pull/2423 Reviewed By: carolineechen Differential Revision: D36766152 Pulled By: mthrok fbshipit-source-id: 53f897a608e97b81ebe5df29577374d88ce178f3
-
- 29 May, 2022 1 commit
-
-
moto authored
Summary: Add num_frames and bits_per_sample to match with the current `torchaudio.info` capability. Pull Request resolved: https://github.com/pytorch/audio/pull/2418 Reviewed By: carolineechen Differential Revision: D36749077 Pulled By: mthrok fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713
-
- 23 May, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: - The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices. - The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask. - The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc. - The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2401 Reviewed By: carolineechen Differential Revision: D36597689 Pulled By: nateanl fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6
-
Zhaoheng Ni authored
Summary: The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose. It contains "10 min", "1 hour", "10 hour" splits. Pull Request resolved: https://github.com/pytorch/audio/pull/2302 Reviewed By: mthrok Differential Revision: D36388188 Pulled By: nateanl fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249
-
- 21 May, 2022 1 commit
-
-
moto authored
Summary: This commit adds file-like object support to Streaming API. ## Features - File-like objects are expected to implement `read(self, n)`. - Additionally `seek(self, offset, whence)` is used if available. - Without `seek` method, some formats cannot be decoded properly. - To work around this, one can use the existing `decoder` option to tell what decoder it should use. - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`. - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed. - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods. ## Code structure The approach is very similar to how file-like object is supported in sox-based I/O. In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind, if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.  ## Refactoring involved - Extracted to https://github.com/pytorch/audio/issues/2402 - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding. - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python. - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly. ## TODO: - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding). Pull Request resolved: https://github.com/pytorch/audio/pull/2400 Reviewed By: carolineechen Differential Revision: D36520073 Pulled By: mthrok fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
-
- 20 May, 2022 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2392 Refactors LibriSpeech tests to accommodate different dataset classes Reviewed By: xiaohui-zhang Differential Revision: D36387835 fbshipit-source-id: 73b4e7565b4a077b25f036f4bd854ac7f2194b28
-
- 19 May, 2022 1 commit
-
-
moto authored
Summary: * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11. * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype. * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.† † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used. Ref https://github.com/pytorch/audio/issues/2400 Pull Request resolved: https://github.com/pytorch/audio/pull/2402 Reviewed By: nateanl Differential Revision: D36488808 Pulled By: mthrok fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402214 fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
-
- 13 May, 2022 1 commit
-
-
moto authored
Summary: This commit moves the Streaming API out of prototype module. * The related classes are renamed as following - `Streamer` -> `StreamReader`. - `SourceStream` -> `StreamReaderSourceStream` - `SourceAudioStream` -> `StreamReaderSourceAudioStream` - `SourceVideoStream` -> `StreamReaderSourceVideoStream` - `OutputStream` -> `StreamReaderOutputStream` This change is preemptive measurement for the possibility to add `StreamWriter` API. * Replace BUILD_FFMPEG build arg with USE_FFMPEG We are not building FFmpeg, so USE_FFMPEG is more appropriate --- After https://github.com/pytorch/audio/issues/2377 Remaining TODOs: (different PRs) - [ ] Introduce `is_ffmpeg_binding_available` function. - [ ] Refactor C++ code: - Rename `Streamer` to `StreamReader`. - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`. - Rename `prototype.cpp` to `stream_reader_binding.cpp`. - Introduce `stream_reader` directory. - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381) Pull Request resolved: https://github.com/pytorch/audio/pull/2378 Reviewed By: carolineechen Differential Revision: D36359299 Pulled By: mthrok fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
-
- 12 May, 2022 3 commits
-
-
moto authored
Summary: This commit updates the lazy module initialization logic for `torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`. - The modules are importable regarless of optional dependencies. i.e. `import torchaudio.prototype.io` does not trigger the check for optional dependencies. - Optional dependencies are checked when the actual API is imported for the first time. i.e. `from torchaudio.prototype.io import Streamer` triggers the check for optional dependencies. The downside is that; - `import torchaudio.prototype.io.Streamer` no longer works. ## Details: Starting from Python 3.7, modules can bave `__getattr__` function, which serves as a fallback if the import mechanism cannot find the attribute. This can be used to implement lazy import. ```python def __getattr__(name): global pi if name == 'pi': import math pi = math.pi return pi raise AttributeError(...) ``` Ref: https://twitter.com/raymondh/status/1094686528440168453 The implementation performs lazy import for the APIs that work with external/optional dependencies. In addition, it also check if the binding is initialized only once. ## Why is this preferable approach? Previously, the optional dependencies were checked at the tiem module is imported; https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5 As long as this module is in `prototype`, which we ask users to import explictly, users had control whether they want/do not want to install the optional dependencies. This approach only works for one optional dependencies per one module. Say, we add different I/O library as an optional dependency, we need to put all the APIs in dedicated submodule. This prevents us from having flat namespace. i.e. the I/O modules with multiple optional dependencies would look like ```python # Client code from torchaudio.io.foo import FooFeature from torchaudio.io.bar import BarFeature ``` where the new approach would allow ```python #client code from torchaudio.io import FooFeature, BarFeature ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2377 Reviewed By: nateanl Differential Revision: D36305603 Pulled By: mthrok fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae -
Zhaoheng Ni authored
Summary: - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform). This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so. - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/2296 Reviewed By: mthrok Differential Revision: D36323217 Pulled By: nateanl fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
-
John Reese authored
Summary: Applies the black-fbsource codemod with the new build of pyfmt. paintitblack Reviewed By: lisroach Differential Revision: D36324783 fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
-
- 11 May, 2022 2 commits
-
-
moto authored
Summary: Conda package build performs simple smoke test, which is different from smoke_test jobs we define on our CI jobs. Currently Conda packaging smoke test verifies the imporatability of `torchaudio.prototype.io`, which requires FFmpeg 4. 1. We list FFmpeg 4 as runtime requirements, but this means that conda's dependency resolver takes FFmpeg 4 into consideration. FFmpeg 5 was release this year, and we can expect that user base will move to FFmpeg gradually. If user environment has some constraint on FFmpeg, torchaudio will have conflict and it will prevent users from install torchaudio. 2. In #2377 the way optional dependency is checked/initialized is changed, so this Conda smoke test will no longer check the integrity with FFmpeg libraries. To solve the issues above, this commit moves the part that tests integrity with FFmpeg libraries to the smoke test we define on CircleCI. Pull Request resolved: https://github.com/pytorch/audio/pull/2381 Reviewed By: carolineechen Differential Revision: D36323706 Pulled By: mthrok fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3
-
moto authored
Summary: On CircleCI, Windows unittests are failing for Python 3.7 with `PermissionError` at the end of test when it cleans up temporary directory. According to the discussion https://github.com/python/cpython/issues/74168, this is caused by a known issue with `shutil.rmtree`. In the above thread it is advised to simply ignore the error as it is not guaranteed that temp directories are cleaned up. This commit follows the same path and simply ignore the error so that our CI gets back to green. Pull Request resolved: https://github.com/pytorch/audio/pull/2379 Reviewed By: carolineechen Differential Revision: D36305595 Pulled By: mthrok fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5
-
- 10 May, 2022 5 commits
-
-
hwangjeff authored
Summary: Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241. Continuation of https://github.com/pytorch/audio/issues/2324. Pull Request resolved: https://github.com/pytorch/audio/pull/2358 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D36137992 Pulled By: hwangjeff fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2375 The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input. Pull Request resolved: https://github.com/pytorch/audio/pull/2376 Reviewed By: hwangjeff Differential Revision: D36280851 Pulled By: nateanl fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. The input arguments are: - multi-channel spectrum. - RTF vector of the target speech - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2368 Reviewed By: carolineechen Differential Revision: D36214940 Pulled By: nateanl fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
-
Zhaoheng Ni authored
Summary: When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency. This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`. Pull Request resolved: https://github.com/pytorch/audio/pull/2369 Reviewed By: carolineechen Differential Revision: D36204130 Pulled By: nateanl fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are: - multi-channel spectrum. - PSD matrix of target speech. - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2367 Reviewed By: hwangjeff Differential Revision: D36198015 Pulled By: nateanl fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
-
- 06 May, 2022 1 commit
-
-
moto authored
Summary: The smoke test jobs simply perform `import torchaudio` to check if the package artifacts are sane. Originally, the CI was executing it in the root directory. This was fine unless the source code is checked out. When source code is checked out, performing `import torchaudio` in root directory would import source torchaudio directory, instead of the installed package. This error is difficult to notice, so this commit introduces common script to perform the smoke test, while moving out of root directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2365 Reviewed By: carolineechen Differential Revision: D36202069 Pulled By: mthrok fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf
-
- 26 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
- 21 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2339 Reviewed By: nateanl Differential Revision: D35806529 Pulled By: hwangjeff fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
-
- 18 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py) modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline Pull Request resolved: https://github.com/pytorch/audio/pull/2290 Reviewed By: nateanl Differential Revision: D35692551 Pulled By: carolineechen fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c
-
- 12 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following: - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances. - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`. - Introduces tests for `conformer_rnnt_model`. - Adds docs. Pull Request resolved: https://github.com/pytorch/audio/pull/2322 Reviewed By: xiaohui-zhang Differential Revision: D35565987 Pulled By: hwangjeff fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
-
- 08 Apr, 2022 1 commit
-
-
moto authored
Summary: Add badges of supported properties and devices to functionals and transforms. This commit adds `.. devices::` and `.. properties::` directives to sphinx. APIs with these directives will have badges (based off of shields.io) which link to the page with description of these features. Continuation of https://github.com/pytorch/audio/issues/2316 Excluded dtypes for further improvement, and actually added badges to most of functional/transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2321 Reviewed By: hwangjeff Differential Revision: D35489063 Pulled By: mthrok fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
-
- 01 Apr, 2022 1 commit
-
-
moto authored
Summary: The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows. https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272 ``` > self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol) E AssertionError: Tensor-likes are not close! E E Mismatched elements: 28 / 196608 (0.0%) E Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed) E Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed) ``` The value of atol==1e-08 seems very strict but all the other batch consistency tests are passing. The violation is for very small number of samples, which looks suspicious, but I think it is okay to reduce it to `1e-06` for Windows. `1e-06` is still more strict than the majority of the comparison tests we have. Pull Request resolved: https://github.com/pytorch/audio/pull/2305 Reviewed By: hwangjeff Differential Revision: D35298056 Pulled By: mthrok fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
-
- 31 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit update `get_sinusoid` function in test utility so that when a multi channel is requested, non-primal channel have randomized initial phase. This adds some variety in test data which should not break the tests. Currently `get_sinusoid` returns identical waveforms for all the channels. This multi channel support was added just to mock the input data so that it is easy to test features with multi-channel inputs, so tests should not be expecting the all channels to be identical. When working on numerical parity, it is more useful if the raw waveforms are somewhat different. Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel <img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2301 Reviewed By: hwangjeff Differential Revision: D35291689 Pulled By: mthrok fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd
-
moto authored
Summary: Tests on `torchaudio.compliance.kaldi` were scattered at different places. This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/` directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2303 Reviewed By: nateanl Differential Revision: D35288400 Pulled By: mthrok fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46
-
- 25 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/2275 Reviewed By: mthrok Differential Revision: D35115418 Pulled By: carolineechen fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0
-
- 22 Mar, 2022 1 commit
-
-
moto authored
Summary: In recent updates, torchaudio added features that download assets/models from download.pytorch.org/torchaudio. To reduce the code duplication, the implementations uses utilities from ``torch.hub``, but still, there are patterns repeated in implementing the fetch mechanism, notably cache and local file path handling. This commit introduces the utility function that handles download/cache/local path management that can be used for fetching pre-trained model data. Pull Request resolved: https://github.com/pytorch/audio/pull/2283 Reviewed By: carolineechen Differential Revision: D35050469 Pulled By: mthrok fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b
-
- 04 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit adds the following behavior to `seek` so that `seek` works after a frame is decoded. 1. Flush the decoder buffer. 2. Recreate filter graphs (so that internal state is re-initialized) 3. Discard the buffered tensor. (decoded chunks) Also it disallows negative values for seek timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2264 Reviewed By: carolineechen Differential Revision: D34497826 Pulled By: mthrok fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
-
moto authored
Summary: `torchaudio.prototype.io.Streamer` class takes context dependant options as `option` argument in the form of mappings of strings. Currently there is no check if the provided options were valid for the given input. This commit adds the check and raise an error if an invalid erro is given. This is analogous to `ffmpeg` command error handling. ``` $ ffmpeg -foo ... Unrecognized option 'foo'. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2263 Reviewed By: hwangjeff Differential Revision: D34495111 Pulled By: mthrok fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
-
- 26 Feb, 2022 2 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2261 Enables prototype ffmpeg io tests in fbcode. Reviewed By: nateanl Differential Revision: D33698353 fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036
-
Zhaoheng Ni authored
Summary: This PR adds ``apply_beamforming`` method to ``torchaudio.functional``. The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum. Pull Request resolved: https://github.com/pytorch/audio/pull/2232 Reviewed By: mthrok Differential Revision: D34474561 Pulled By: nateanl fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
-
- 25 Feb, 2022 5 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``rtf_power`` method to ``torchaudio.functional``. The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206). [This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English. The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2231 Reviewed By: mthrok Differential Revision: D34474503 Pulled By: nateanl fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb
-
Zhaoheng Ni authored
Summary: This PR adds `rtf_evd` method to `torchaudio.functional`. The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. The input argument is the power spectral density (PSD) matrix of the target speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2230 Reviewed By: mthrok Differential Revision: D34474188 Pulled By: nateanl fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference. The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2229 Reviewed By: mthrok Differential Revision: D34474119 Pulled By: nateanl fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2228 Reviewed By: mthrok Differential Revision: D34474018 Pulled By: nateanl fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb
-
Zhaoheng Ni authored
Summary: This PR adds ``psd`` method to ``torchaudio.functional``. It computes the power spectral density (PSD) matrix of the complex-valued spectrum. The method also supports normalization of Time-Frequency mask. Pull Request resolved: https://github.com/pytorch/audio/pull/2227 Reviewed By: mthrok Differential Revision: D34473908 Pulled By: nateanl fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9
-