- 20 Dec, 2022 1 commit
-
-
moto authored
Summary: If the input video has invalid PTS, the current precise seek fails except when seeking into t=0. This commit updates the discard mechanism to fallback to `best_effort_timestamp` in such cases. `best_effort_timestamp` is just the number of frames went through decoder starting from the beginning of the file. This means if the input file is very long, but seeking towards the end of the file, the StreamReader still decodes all the frames. For videos with valid PTS, `best_effort_timestamp` should be same as `pts`. [[src](https://ffmpeg.org/doxygen/4.1/decode_8c.html#a8d86329cf58a4adbd24ac840d47730cf)] Pull Request resolved: https://github.com/pytorch/audio/pull/2916 Reviewed By: YosuaMichael Differential Revision: D42170204 Pulled By: mthrok fbshipit-source-id: 80c04dc376e0f427d41eb9feb44c251a1648a998
-
- 09 Nov, 2022 1 commit
-
-
Grigory Sizov authored
Summary: Closes T136364380 Added [WavLM Model](https://github.com/microsoft/UniSpeech/tree/main/WavLM): - Added `WavLMSelfAttention` class (from [original implementation](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py)) and adjusted existing Encoder and Transformer classes to be compatible with it - Added factory functions `wavlm_model`, `wavlm_base`, `wavlm_large` to `models/wav2vec2/model.py` - Added bundles for base and large models to pipelines. **TODO**: pre-trained model weights are not yet uploaded to `download.pytorch.org`, permissions not granted yet. ## Tests - Expanded HuggingFace integration tests to cover WavLM. For there tests, added JSON configs for base and large models from HF ([base](https://huggingface.co/microsoft/wavlm-base/blob/main/config.json), [large](https://huggingface.co/microsoft/wavlm-large/blob/main/config.json)) into test assets - Expanded TorchScript and quantization tests to cover WavLM ## Comments There are a few workarounds I had to introduce: - Quantization tests for WavLM were breaking down at [`torch.cat`](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR466) ~~until I excluded the arguments of `torch.cat` from quantization [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR368-R369). I haven't found a better way to fix it, let me know if there is one~~ The reason for this seems to be that quantization replaces `.bias` and `.weight` attributes of a `Linear` module with methods. Since we are using weights and biases directly, the code was break. The final solution suggested by nateanl was to define attention weights and biases directly in `WavLMSelfAttention`, skipping the `Linear` layers - ~~WavLM uses position embedding in the first layer of encoder, but not in the subsequent ones. So [UniSpeech](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py#L342) and [HF](https://github.com/huggingface/transformers/blob/b047472650cba259621549ac27b18fd2066ce18e/src/transformers/models/wavlm/modeling_wavlm.py#L441-L442) implementations only create this embedding module in the layers where it's used. However, we can't do this here because it breaks TorchScript. So as a solution I add a dummy `Identity` module to `WavLMSelfAttention` when the actual embedding is not needed: [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR361-R368).~~ Thanks nateanl for resolving this! - I had to add dummy `position_bias` and `key_padding_mask` arguments to `SelfAttention.forward` to make TorchScript tests pass. Since both `SelfAttention` and `WavLMSelfAttention` are called from `EncoderLayer`, they need to have compatible signatures. Having a variable number of arguments with `**kwargs` or checking object class doesn't seem to work with TorchScript, so I instead made both types of attention accept `position_bias` and `key_padding_mask` arguments. Nit: do we still need to specify `__all__` if there are no wildcard imports in `__init__.py`, e.g. in `torchaudio/models/__init__.py`? Pull Request resolved: https://github.com/pytorch/audio/pull/2822 Reviewed By: nateanl Differential Revision: D41121855 Pulled By: sgrigory fbshipit-source-id: 9f4f787e5810010de4e74cb704063a26c66767d7
-
- 31 Oct, 2022 1 commit
-
-
Joao Gomes authored
Summary: cc mthrok Implements precise seek and seek to any frame in torchaudio Pull Request resolved: https://github.com/pytorch/audio/pull/2737 Reviewed By: mthrok Differential Revision: D40546716 Pulled By: jdsgomes fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e
-
- 09 Aug, 2022 1 commit
-
-
Caroline Chen authored
Summary: Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs. The `ctc_decoder` API is as follows - To decode with KenLM, pass in KenLM language model path to `lm` variable - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`. - To decode without a language model, set `lm` to `None` (default) Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM. Follow ups: - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory cc jacobkahn Pull Request resolved: https://github.com/pytorch/audio/pull/2528 Reviewed By: mthrok Differential Revision: D38243802 Pulled By: carolineechen fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
-
- 26 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
- 21 Jan, 2022 1 commit
-
-
moto authored
Summary: Split from https://github.com/pytorch/audio/issues/2164 Add new test assets. Adding this commit separately so that this commit message about the origin of the file is easier to find. The original video is in public domain par - https://svs.gsfc.nasa.gov/13013 - https://www.nasa.gov/multimedia/guidelines/index.html (The YouTube page directly says so) - https://www.youtube.com/watch?v=6zNsc0e3Zns So, the video is modified to fit the needs for testing. 1. multiple audio/video streams 2. Non-audio/video (subtitle) streams 3. Different FPS and sampling rate 4. Ones without audio and video. ``` #!/usr/bin/env bash original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt # Fetch the original video, embed the subtitle ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y # Extract, rescale video and resample audio ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000 tmp2.mp4 -y # Merge them, retaining all the streams (6 in total) ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y # Make versions without audio / video ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2167 Reviewed By: carolineechen Differential Revision: D33712954 Pulled By: mthrok fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211
-
- 23 Dec, 2021 2 commits
-
-
Caroline Chen authored
Summary: Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review This PR adds Python decoder API and basic README Pull Request resolved: https://github.com/pytorch/audio/pull/2089 Reviewed By: mthrok Differential Revision: D33299818 Pulled By: carolineechen fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
- 18 Nov, 2021 1 commit
-
-
Facebook Community Bot authored
Co-authored-by:Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
-
- 17 Nov, 2021 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2015 as titled Reviewed By: hwangjeff, mthrok Differential Revision: D32495691 fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a
-
- 22 Oct, 2021 1 commit
-
-
moto authored
- Make the test support other languages - Fetch tetst asset on-the-fly
-
- 05 Oct, 2021 1 commit
-
-
moto authored
-
- 28 Sep, 2021 1 commit
-
-
moto authored
This commit adds the following HuBERT model architectures - `base` (pre-training) - `large` (pre-training / fine-tuning) - `xlarge` (pre-training / fine-tuning) Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules.. With these models, it is possible to - import the pre-trained model published by `fairseq` and TorchScript it. - fine-tune the existing model for downstream task.
-
- 22 Sep, 2021 1 commit
-
-
moto authored
Summary: Update fairseq reference from master to main elsewhere Reviewed By: alexeib Differential Revision: D30938472 fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8 Co-authored-by:Diana Liskovich <dianaml@fb.com>
-
- 04 Jun, 2021 1 commit
-
-
moto authored
`torchaudio.compliance.kaldi.resample_waveform` has been replaced with `torchaudio.funcitonal.resample`.
-
- 01 Jun, 2021 1 commit
-
-
moto authored
-
- 27 May, 2021 1 commit
-
-
moto authored
-
- 15 Mar, 2021 1 commit
-
-
moto authored
-
- 09 Feb, 2021 1 commit
-
-
moto authored
-
- 30 Dec, 2020 1 commit
-
-
Aziz authored
-
- 22 Dec, 2020 1 commit
-
-
moto authored
-
- 05 Aug, 2020 1 commit
-
-
moto authored
We have been running unit test with editable installation. (i.e. `python setup.py develop`), with which we missed issues like #842. This CC makes installation in CI non-editable, and change test directory structure so that the source code will not shadow the installed version of `torchaudio`. With simple `pytest test`, `pytest` modifies `sys.path` and prepend checked out repository, which shadows the installed version. To remedy this, the whole test suites has been moved from `./test` to `./test/torchaudio_unittest`. This adds nice module structure to our test code and we can do absolute import in each test module, which makes it possible again to run test with `python -m unittest torchaudio_unittest/XXX.py` This change does not affect the regular development process (`python setup.py develop` && `pytest test`)
-