- 21 May, 2022 1 commit
-
-
moto authored
Summary: This commit adds file-like object support to Streaming API. ## Features - File-like objects are expected to implement `read(self, n)`. - Additionally `seek(self, offset, whence)` is used if available. - Without `seek` method, some formats cannot be decoded properly. - To work around this, one can use the existing `decoder` option to tell what decoder it should use. - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`. - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed. - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods. ## Code structure The approach is very similar to how file-like object is supported in sox-based I/O. In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind, if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.  ## Refactoring involved - Extracted to https://github.com/pytorch/audio/issues/2402 - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding. - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python. - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly. ## TODO: - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding). Pull Request resolved: https://github.com/pytorch/audio/pull/2400 Reviewed By: carolineechen Differential Revision: D36520073 Pulled By: mthrok fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
-
- 20 May, 2022 3 commits
-
-
moto authored
Summary: After https://github.com/pytorch/audio/issues/2395, build_doc job is exceeding default no-output-timeout threshould (10m). This commit updates the timeout threshold to 30m. Also it moves the installation of tools to the previous step. Pull Request resolved: https://github.com/pytorch/audio/pull/2399 Reviewed By: carolineechen Differential Revision: D36539022 Pulled By: mthrok fbshipit-source-id: 391764a0fb5bf87cfb2beaab401a90dcb56493e5
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2392 Refactors LibriSpeech tests to accommodate different dataset classes Reviewed By: xiaohui-zhang Differential Revision: D36387835 fbshipit-source-id: 73b4e7565b4a077b25f036f4bd854ac7f2194b28
-
moto authored
Summary: This commit adds tutorial to enable/use NVDEC with Stream API. https://output.circle-artifacts.com/output/job/19e66a4b-1819-4804-8834-d38e6c80c4fd/artifacts/0/docs/hw_acceleration_tutorial.html Because the use of NVDEC requires build / install FFmpeg from source, this tutorial was authored on Google Colab, tailored to its environment. The tutorial here is the result of the notebook execution, with the link to the publicly accessible Google Colab notebook. Pull Request resolved: https://github.com/pytorch/audio/pull/2393 Reviewed By: hwangjeff Differential Revision: D36404408 Pulled By: mthrok fbshipit-source-id: 9c820d3db4d06c5b343ecad0708489125ca06948
-
- 19 May, 2022 2 commits
-
-
Eli Uriegas authored
Summary: To resolve nightly / general build issues relating to OpenMP not being found, see https://hud.pytorch.org/pytorch/audio/commit/c6a376cc5679c1940e49fc3e0ba22eaead6c2467 ``` -- Found Torch: /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/torch/lib/libtorch.dylib CMake Error at /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES) Call Stack (most recent call first): /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE) /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindOpenMP.cmake:544 (find_package_handle_standard_args) CMakeLists.txt:131 (find_package) -- Configuring incomplete, errors occurred! ``` Signed-off-by:
Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/audio/pull/2404 Reviewed By: atalman Differential Revision: D36495791 Pulled By: seemethere fbshipit-source-id: 7b6fa2a62fda6fc468cfcbdf8d2163e6b9c327b0
-
moto authored
Summary: * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11. * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype. * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.† † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used. Ref https://github.com/pytorch/audio/issues/2400 Pull Request resolved: https://github.com/pytorch/audio/pull/2402 Reviewed By: nateanl Differential Revision: D36488808 Pulled By: mthrok fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
-
- 18 May, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: In Wav2Vec2 and HuBERT model training, the convolutional feature extraction layers use `group_norm` for normalization in `Base` model, while they use `layer_norm` in `Large` and `XLarge` models. For `Base` model, the gradients of feature extraction layers will be unstable in pre-training, thus we need to scale down the gradient by multiplying 0.1. In this PR, we add such argument to `HuBERTPretrainModel` to control the gradient of feature extractor layers. We also put the argument in the factory functions (`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`. The reason is in finetuning, the feature extractor's parameters are fixed, we can multiply the gradient with 0.0 to avoid back propagating gradients. Pull Request resolved: https://github.com/pytorch/audio/pull/2335 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D35646928 Pulled By: nateanl fbshipit-source-id: 6a9563e227aac6e3127b634357946d860f26c994
-
- 17 May, 2022 1 commit
-
-
moto authored
Summary: This commit updates the `window.sideMenus.handleRightMenu`, so that subsections are expanded on tutorials by default. https://output.circle-artifacts.com/output/job/98508917-87df-4666-9958-c70683b3245d/artifacts/0/docs/tutorials/audio_io_tutorial.html Tutorial subsections are important because they have anchors so allow us to get the link to the specific figures / audio samples. When responding issues/questions and when there is a corresponding code snippet in tutorial, it is often easy to answer with links to the tutorial. However, by default the tutorial page collapses right side bar, and I have to click the small "+" symbols to navigate to the subsection, and the state of expansion does not persist across the page refresh. This has been a pain point since we updated the Sphinx version to 3 in https://github.com/pytorch/audio/pull/1685. Pull Request resolved: https://github.com/pytorch/audio/pull/2397 Reviewed By: xiaohui-zhang Differential Revision: D36429745 Pulled By: mthrok fbshipit-source-id: 97a5ae9270e68f8e88f0bca766d5a2c1839634e3
-
- 16 May, 2022 1 commit
-
-
moto authored
Summary: This commit moves `build_doc` job to run on top of Conda binary build job. The motivation is that Conda provides easy access to third party tools that are required to build complex documentation. Specifically in https://github.com/pytorch/audio/pull/2393, ipynb-style tutorial is being added, which requires `nbsphinx`. `nbsphinx` requires `pandoc` package and there was some issue with the version from PyPI. A workaround is to use the one from Conda package. Pull Request resolved: https://github.com/pytorch/audio/pull/2395 Reviewed By: carolineechen, nateanl Differential Revision: D36404407 Pulled By: mthrok fbshipit-source-id: 26ec5ebfd5be795384306a9f24817a2eb3ec96c1
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402214 fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
-
- 13 May, 2022 2 commits
-
-
hwangjeff authored
Summary: Refactors `librispeech.py` to clarify its logic. Pull Request resolved: https://github.com/pytorch/audio/pull/2387 Reviewed By: nateanl Differential Revision: D36359176 Pulled By: hwangjeff fbshipit-source-id: 595dd1421738279896348448dd72ca57bfe7cef2
-
moto authored
Summary: This commit moves the Streaming API out of prototype module. * The related classes are renamed as following - `Streamer` -> `StreamReader`. - `SourceStream` -> `StreamReaderSourceStream` - `SourceAudioStream` -> `StreamReaderSourceAudioStream` - `SourceVideoStream` -> `StreamReaderSourceVideoStream` - `OutputStream` -> `StreamReaderOutputStream` This change is preemptive measurement for the possibility to add `StreamWriter` API. * Replace BUILD_FFMPEG build arg with USE_FFMPEG We are not building FFmpeg, so USE_FFMPEG is more appropriate --- After https://github.com/pytorch/audio/issues/2377 Remaining TODOs: (different PRs) - [ ] Introduce `is_ffmpeg_binding_available` function. - [ ] Refactor C++ code: - Rename `Streamer` to `StreamReader`. - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`. - Rename `prototype.cpp` to `stream_reader_binding.cpp`. - Introduce `stream_reader` directory. - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381) Pull Request resolved: https://github.com/pytorch/audio/pull/2378 Reviewed By: carolineechen Differential Revision: D36359299 Pulled By: mthrok fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
-
- 12 May, 2022 4 commits
-
-
moto authored
Summary: This commit updates the lazy module initialization logic for `torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`. - The modules are importable regarless of optional dependencies. i.e. `import torchaudio.prototype.io` does not trigger the check for optional dependencies. - Optional dependencies are checked when the actual API is imported for the first time. i.e. `from torchaudio.prototype.io import Streamer` triggers the check for optional dependencies. The downside is that; - `import torchaudio.prototype.io.Streamer` no longer works. ## Details: Starting from Python 3.7, modules can bave `__getattr__` function, which serves as a fallback if the import mechanism cannot find the attribute. This can be used to implement lazy import. ```python def __getattr__(name): global pi if name == 'pi': import math pi = math.pi return pi raise AttributeError(...) ``` Ref: https://twitter.com/raymondh/status/1094686528440168453 The implementation performs lazy import for the APIs that work with external/optional dependencies. In addition, it also check if the binding is initialized only once. ## Why is this preferable approach? Previously, the optional dependencies were checked at the tiem module is imported; https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5 As long as this module is in `prototype`, which we ask users to import explictly, users had control whether they want/do not want to install the optional dependencies. This approach only works for one optional dependencies per one module. Say, we add different I/O library as an optional dependency, we need to put all the APIs in dedicated submodule. This prevents us from having flat namespace. i.e. the I/O modules with multiple optional dependencies would look like ```python # Client code from torchaudio.io.foo import FooFeature from torchaudio.io.bar import BarFeature ``` where the new approach would allow ```python #client code from torchaudio.io import FooFeature, BarFeature ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2377 Reviewed By: nateanl Differential Revision: D36305603 Pulled By: mthrok fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae -
Zhaoheng Ni authored
Summary: - Use `apply_beamforming`, `rtf_evd`, `rtf_power`, `mvdr_weights_souden`, `mvdr_weights_rtf` methods under `torchaudio.functional` to replace the class methods. - Refactor docstrings in `PSD` and `MVDR`. - Put `_get_mvdr_vector` outside of `MVDR` class as it doesn't call self methods inside. - Since MVDR uses einsum for matrix operations, packing and unpacking batches are not necessary. It can be tested by the [batch_consistency_test](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/transforms/batch_consistency_test.py#L202). Removed it from the code. Pull Request resolved: https://github.com/pytorch/audio/pull/2383 Reviewed By: carolineechen, mthrok Differential Revision: D36338373 Pulled By: nateanl fbshipit-source-id: a48a6ae2825657e5967a19656245596cdf037c5f
-
Zhaoheng Ni authored
Summary: - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform). This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so. - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/2296 Reviewed By: mthrok Differential Revision: D36323217 Pulled By: nateanl fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
-
John Reese authored
Summary: Applies the black-fbsource codemod with the new build of pyfmt. paintitblack Reviewed By: lisroach Differential Revision: D36324783 fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
-
- 11 May, 2022 6 commits
-
-
moto authored
Summary: Conda package build performs simple smoke test, which is different from smoke_test jobs we define on our CI jobs. Currently Conda packaging smoke test verifies the imporatability of `torchaudio.prototype.io`, which requires FFmpeg 4. 1. We list FFmpeg 4 as runtime requirements, but this means that conda's dependency resolver takes FFmpeg 4 into consideration. FFmpeg 5 was release this year, and we can expect that user base will move to FFmpeg gradually. If user environment has some constraint on FFmpeg, torchaudio will have conflict and it will prevent users from install torchaudio. 2. In #2377 the way optional dependency is checked/initialized is changed, so this Conda smoke test will no longer check the integrity with FFmpeg libraries. To solve the issues above, this commit moves the part that tests integrity with FFmpeg libraries to the smoke test we define on CircleCI. Pull Request resolved: https://github.com/pytorch/audio/pull/2381 Reviewed By: carolineechen Differential Revision: D36323706 Pulled By: mthrok fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3
-
Zhaoheng Ni authored
Summary: The modules include: - PSD - MVDR - RTFMVDR - SoudenMVDR Pull Request resolved: https://github.com/pytorch/audio/pull/2382 Reviewed By: carolineechen Differential Revision: D36314096 Pulled By: nateanl fbshipit-source-id: 9d7d962b1c70cdc435a579191ad88838dd6fc0ba
-
moto authored
Summary: Since a while ago, CodeQL is always emitting red signal, but the team does not know what this is / how to fix this. At this point, it is purely noise while not providing a valuable signal. Ref https://github.com/pytorch/audio/issues/2314 Pull Request resolved: https://github.com/pytorch/audio/pull/2380 Reviewed By: carolineechen Differential Revision: D36305599 Pulled By: mthrok fbshipit-source-id: 27ece58730066543600f3873397b9a239e54beb0
-
moto authored
Summary: On CircleCI, Windows unittests are failing for Python 3.7 with `PermissionError` at the end of test when it cleans up temporary directory. According to the discussion https://github.com/python/cpython/issues/74168, this is caused by a known issue with `shutil.rmtree`. In the above thread it is advised to simply ignore the error as it is not guaranteed that temp directories are cleaned up. This commit follows the same path and simply ignore the error so that our CI gets back to green. Pull Request resolved: https://github.com/pytorch/audio/pull/2379 Reviewed By: carolineechen Differential Revision: D36305595 Pulled By: mthrok fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5
-
hwangjeff authored
Summary: Modifies the example LibriSpeech Conformer RNN-T recipe as follows: - Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module). - Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms). - Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments). - Modifies training script to allow for specifying path model checkpoint to restart training from. Pull Request resolved: https://github.com/pytorch/audio/pull/2366 Reviewed By: mthrok Differential Revision: D36305028 Pulled By: hwangjeff fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba
-
moto authored
Summary: This commit refactor the constructor of wrapper classes so that wrapper classes are only responsible for deallocation of underlying FFmpeg custom structures. The responsibility of custom initialization is moved to helper functions. Context: FFmpeg API uses bunch of raw pointers, which require dedicated allocater and deallcoator. In torchaudio we wrap these pointers with `std::unique_ptr<>` to adopt RAII semantics. Currently all of the customization logics required for `Streamer` are handled by the constructor of wrapper class. Like the following; ``` AVFormatContextPtr( const std::string& src, const std::string& device, const std::map<std::string, std::string>& option); ``` This constructor allocates the raw `AVFormatContext*` pointer, while initializing it with the given option, then it parses the input media. As we consider the write/encode features, which require different way of initializing the `AVFormatContext*`, making it the responsibility of constructors of `AVFormatContextPtr` reduce the flexibility. Thus this commit moves the customization to helper factory function. - `AVFormatContextPtr(...)` -> `get_input_format_context(...)` - `AVCodecContextPtr(...)` -> `get_decode_context(...)` Pull Request resolved: https://github.com/pytorch/audio/pull/2373 Reviewed By: hwangjeff Differential Revision: D36230148 Pulled By: mthrok fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a
-
- 10 May, 2022 8 commits
-
-
hwangjeff authored
Summary: Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241. Continuation of https://github.com/pytorch/audio/issues/2324. Pull Request resolved: https://github.com/pytorch/audio/pull/2358 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D36137992 Pulled By: hwangjeff fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2375 The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input. Pull Request resolved: https://github.com/pytorch/audio/pull/2376 Reviewed By: hwangjeff Differential Revision: D36280851 Pulled By: nateanl fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112
-
Kyle Chen authored
Summary: previous update for rocm: https://github.com/pytorch/audio/pull/2186 Pull Request resolved: https://github.com/pytorch/audio/pull/2362 Reviewed By: seemethere Differential Revision: D36283672 Pulled By: mthrok fbshipit-source-id: bfd38940d027c8ccd72ab48991e5ab7f84b0e9c0
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. The input arguments are: - multi-channel spectrum. - RTF vector of the target speech - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2368 Reviewed By: carolineechen Differential Revision: D36214940 Pulled By: nateanl fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
-
Zhaoheng Ni authored
Summary: When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency. This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`. Pull Request resolved: https://github.com/pytorch/audio/pull/2369 Reviewed By: carolineechen Differential Revision: D36204130 Pulled By: nateanl fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are: - multi-channel spectrum. - PSD matrix of target speech. - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2367 Reviewed By: hwangjeff Differential Revision: D36198015 Pulled By: nateanl fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
-
moto authored
Summary: This commits add `hw_accel` option to `Streamer::add_video_stream` method. Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA, when the following conditions are met. 1. the video format is H264, 2. underlying ffmpeg is compiled with NVENC, and 3. the client code specifies `decoder="h264_cuvid"`. A simple benchmark yields x7 improvement in the decoding speed. <details> ```python import time from torchaudio.prototype.io import Streamer srcs = [ "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", # offline version ] patterns = [ ("h264_cuvid", None, "cuda:0"), # NVDEC on CUDA:0 -> CUDA:0 ("h264_cuvid", None, "cuda:1"), # NVDEC on CUDA:1 -> CUDA:1 ("h264_cuvid", None, None), # NVDEC -> CPU (None, None, None), # CPU ] for src in srcs: print(src, flush=True) for (decoder, decoder_options, hw_accel) in patterns: s = Streamer(src) s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel) t0 = time.monotonic() num_frames = 0 for i, (chunk, ) in enumerate(s.stream()): num_frames += chunk.shape[0] t1 = time.monotonic() print(chunk.dtype, chunk.shape, chunk.device) print(time.monotonic() - t0, num_frames, flush=True) ``` </details> ``` https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0 10.781158386962488 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1 10.771313901990652 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 27.88662809302332 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 83.22728440898936 6175 ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0 12.945253834011964 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1 12.870224556012545 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 28.03406483103754 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 82.6120332319988 6175 ``` With HW resizing <details> ```python import time from torchaudio.prototype.io import Streamer srcs = [ "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", ] patterns = [ # Decode with NVDEC, CUDA HW scaling -> CUDA:0 ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"), # Decoded with NVDEC, CUDA HW scaling -> CPU ("h264_cuvid", {"resize": "960x540"}, "", None), # CPU decoding, CPU scaling (None, None, "scale=width=960:height=540", None), ] for src in srcs: print(src, flush=True) for (decoder, decoder_options, filter_desc, hw_accel) in patterns: s = Streamer(src) s.add_video_stream( 5, decoder=decoder, decoder_options=decoder_options, filter_desc=filter_desc, hw_accel=hw_accel, ) t0 = time.monotonic() num_frames = 0 for i, (chunk, ) in enumerate(s.stream()): num_frames += chunk.shape[0] t1 = time.monotonic() print(chunk.dtype, chunk.shape, chunk.device) print(time.monotonic() - t0, num_frames, flush=True) ``` </details> ``` ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0 12.890056837990414 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 10.697489063022658 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 85.19899423001334 6175 https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0 10.712715593050234 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 11.030170071986504 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 84.8515750519582 6175 ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2331 Reviewed By: hwangjeff Differential Revision: D36217169 Pulled By: mthrok fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b -
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371 Reviewed By: xiaohui-zhang Differential Revision: D36246167 Pulled By: carolineechen fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4
-
- 09 May, 2022 1 commit
-
-
Andrey Talman authored
Summary: Cleanup old version of cuda115 and other legacy versions Pull Request resolved: https://github.com/pytorch/audio/pull/2374 Reviewed By: nateanl, mthrok Differential Revision: D36250955 Pulled By: atalman fbshipit-source-id: 6b7f0e2926eeb688991c939901c980428cf8e7ef
-
- 06 May, 2022 2 commits
-
-
moto authored
Summary: This commit changes the way torchaudio binary distributions are built. * For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries. * The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL. * The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment. * The torchaudio binary build process will use them to bootstrap its build process. * The custom FFmpeg libraries are NOT shipped. This commit also add disclaimer about FFmpeg in README. Pull Request resolved: https://github.com/pytorch/audio/pull/2355 Reviewed By: nateanl Differential Revision: D36202087 Pulled By: mthrok fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef
-
moto authored
Summary: The smoke test jobs simply perform `import torchaudio` to check if the package artifacts are sane. Originally, the CI was executing it in the root directory. This was fine unless the source code is checked out. When source code is checked out, performing `import torchaudio` in root directory would import source torchaudio directory, instead of the installed package. This error is difficult to notice, so this commit introduces common script to perform the smoke test, while moving out of root directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2365 Reviewed By: carolineechen Differential Revision: D36202069 Pulled By: mthrok fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf
-
- 05 May, 2022 2 commits
-
-
moto authored
Summary: Currently smoke tests are only executed on nightly jobs. This is inconvenient as PRs that changes build process do not get the signal naturally. This commit changes it by always executing smoke tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2364 Reviewed By: atalman Differential Revision: D36171267 Pulled By: mthrok fbshipit-source-id: e549965ba139b5992177b7a094d87c9ef4432a7f
-
Andrey Talman authored
Summary: This PR fixes Windows Smoke tests Tested via circleci : https://app.circleci.com/pipelines/github/pytorch/audio/10572/workflows/970fd791-25cc-4af4-8183-a7835e1891bf/jobs/637607 Pull Request resolved: https://github.com/pytorch/audio/pull/2361 Reviewed By: nateanl, mthrok Differential Revision: D36167317 Pulled By: atalman fbshipit-source-id: 1418ebffd74614cc1110dc032d16ee9502a7d571
-
- 28 Apr, 2022 2 commits
-
-
moto authored
Summary: libmad integration should be enabled only from source-build Pull Request resolved: https://github.com/pytorch/audio/pull/2354 Reviewed By: nateanl Differential Revision: D36012035 Pulled By: mthrok fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7
-
Andrey Talman authored
Summary: Fix audio win smoke test to use GPU hosts for CUDA builds Pull Request resolved: https://github.com/pytorch/audio/pull/2353 Reviewed By: mthrok Differential Revision: D36006928 Pulled By: atalman fbshipit-source-id: a27c4cc34093810c8cc08e01188e09b474478001
-
- 27 Apr, 2022 1 commit
-
-
Guo Liyong authored
Summary: This PR amends `RNNTBeamSearch`'s streaming decoding method to correctly unsqueeze `length` when its dimension is 0. Original comment: Is "input.dim() == 0" unreachable as it could only be 2 or 3 in assertion of Line 329? Pull Request resolved: https://github.com/pytorch/audio/pull/2344 Reviewed By: carolineechen, nateanl Differential Revision: D35899740 Pulled By: hwangjeff fbshipit-source-id: 84c1692b8cc9e5d35798d87f4a1bd052d94af9fb
-
- 26 Apr, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
Zhaoheng Ni authored
Summary: In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)). This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function. Pull Request resolved: https://github.com/pytorch/audio/pull/2345 Reviewed By: carolineechen, xiaohui-zhang Differential Revision: D35845117 Pulled By: nateanl fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7
-