- 24 May, 2023 5 commits
-
-
moto authored
Summary: Follow-up https://github.com/pytorch/audio/issues/3045 - Revert the removal of HW acceleration doc - comment out FFmpeg CLI test run Pull Request resolved: https://github.com/pytorch/audio/pull/3349 Reviewed By: nateanl Differential Revision: D46121899 Pulled By: mthrok fbshipit-source-id: dfc030a69f05addec73637cfb6a720c184e37323
-
moto authored
Summary: * Delay the import of torchaudio until the CLI options are parsed. * Add option to set log level to DEBUG so that it's easy to see the issue with external libraries. Pull Request resolved: https://github.com/pytorch/audio/pull/3346 Reviewed By: nateanl Differential Revision: D46022546 Pulled By: mthrok fbshipit-source-id: 9f988bbd770c2fd2bb260c3cfe02b238a9da2808
-
moto authored
Summary: This commit changes the way doc is pushed. It ammends instead of adding a new commit. Currently each commit in gh-pages contain like 100MB of data. gh-pages branch is fetched by default when `git clone`. So the size of torchaudio repo grows significantly. Pull Request resolved: https://github.com/pytorch/audio/pull/3345 Reviewed By: nateanl Differential Revision: D46136612 Pulled By: mthrok fbshipit-source-id: 39479ee5d1a6888254ef50f0db252453d976d183
-
pbialecki authored
Summary: CC atalman malfet Pull Request resolved: https://github.com/pytorch/audio/pull/3360 Reviewed By: mthrok Differential Revision: D46150898 Pulled By: atalman fbshipit-source-id: 985a0ef69406f48fb15f239d6b16616c0a5379f5
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3366 Reviewed By: nateanl Differential Revision: D46136238 Pulled By: mthrok fbshipit-source-id: 3432f5d007293831bab21460a79ae26b1bbc81a8
-
- 23 May, 2023 6 commits
-
-
Zhaoheng Ni authored
Summary: resolve https://github.com/pytorch/audio/issues/3347 `position_bias` is ignored in `extract_features` method, this doesn't affect Wav2Vec2 or HuBERT models, but it changes the output of transformer layers (except the first layer) in WavLM model. This PR fixes it by adding `position_bias` to the method. Pull Request resolved: https://github.com/pytorch/audio/pull/3350 Reviewed By: mthrok Differential Revision: D46112148 Pulled By: nateanl fbshipit-source-id: 3d21aa4b32b22da437b440097fd9b00238152596
-
Omkar Salpekar authored
Summary: As discussed in the [Torchaudio Migration Proposal](https://docs.google.com/document/d/1PF8biwiGzsjzfEBM78mlLiRrkcsGsvuYkeqkI66Ym8A/edit), this PR moves MacOS unittest job to Nova tooling. Note that this does not touch anything within the existing CircleCI job at the moment. Passing job: https://github.com/pytorch/audio/actions/runs/4932497525/jobs/8815581251?pr=3324 Pull Request resolved: https://github.com/pytorch/audio/pull/3324 Reviewed By: atalman, mthrok Differential Revision: D46113524 Pulled By: osalpekar fbshipit-source-id: d048d300489f992fa187628cb6744d95ab4fb68a
-
Zhaoheng Ni authored
Summary: Fix https://github.com/pytorch/audio/issues/3361 When adding FunctionalCUDAOnlyTest, the class should inherit from `TestBaseMixin` instead of `Functional` Pull Request resolved: https://github.com/pytorch/audio/pull/3363 Reviewed By: atalman, osalpekar Differential Revision: D46112084 Pulled By: nateanl fbshipit-source-id: 67c6472fda98cb718e0fc53ab248beda745feab5
-
moto authored
Summary: When saving audio with vorbis, BPS should not be specified, otherwise warnings that cannot be turned off are shown. Address: https://github.com/pytorch/audio/issues/3358 Pull Request resolved: https://github.com/pytorch/audio/pull/3359 Reviewed By: nateanl Differential Revision: D46095037 Pulled By: mthrok fbshipit-source-id: 6885a12dc3ec84bf39f0159ee58d1a2a87cff7e4
-
Omkar Salpekar authored
Summary: As discussed in the [Torchaudio Migration Proposal](https://docs.google.com/document/d/1PF8biwiGzsjzfEBM78mlLiRrkcsGsvuYkeqkI66Ym8A/edit), this PR moves the Linux CPU unittest job to Nova tooling. Note that this does not disable the existing CircleCI job at the moment. Passing Job: https://github.com/pytorch/audio/actions/runs/4986115298/jobs/8926499354?pr=3323 Pull Request resolved: https://github.com/pytorch/audio/pull/3323 Reviewed By: atalman, mthrok Differential Revision: D46113506 Pulled By: osalpekar fbshipit-source-id: 1778c360e17b9d02c63bcc60100834c75798d380
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3356 move the forced aligner tutorial to torchaudio, with some formatting changes Reviewed By: mthrok Differential Revision: D46060238 fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec
-
- 22 May, 2023 4 commits
-
-
Omkar Salpekar authored
Summary: Cleaning up CCI configs that are no longer used. Pull Request resolved: https://github.com/pytorch/audio/pull/3340 Reviewed By: mthrok Differential Revision: D46077882 Pulled By: osalpekar fbshipit-source-id: 0dce08fc14b5efc4517ab1f559e7ef7eb245af64
-
Zhaoheng Ni authored
Summary: - Fix latex formula rendering issue - Add `devices` and `properties` tags - Fix grammar Pull Request resolved: https://github.com/pytorch/audio/pull/3357 Reviewed By: mthrok Differential Revision: D46068633 Pulled By: nateanl fbshipit-source-id: 80cb84508396fbcaf81c068228d46a24bb63b975
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3354 when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0. Reviewed By: xiaohui-zhang Differential Revision: D46059971 fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3355 Reviewed By: xiaohui-zhang Differential Revision: D46060254 Pulled By: nateanl fbshipit-source-id: c2e44f994739755daf049fe350dd24a987a9cc29
-
- 21 May, 2023 2 commits
-
-
Moto Hira authored
Differential Revision: D45960556 Original commit changeset: 93f2271f7130 Original Phabricator Diff: D45960556 fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3351 move the forced aligner tutorial to torchaudio, with some formatting changes Reviewed By: vineelpratap, nateanl Differential Revision: D45960556 fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa
-
- 20 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3348 The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame. Reviewed By: vineelpratap, xiaohui-zhang Differential Revision: D45867265 fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb
-
- 19 May, 2023 1 commit
-
-
moto authored
Summary: This commit add the step to build FFmpeg with GPU decoder in build_doc job so that we can use GPU decoder/encoder in documentations. Pull Request resolved: https://github.com/pytorch/audio/pull/3045 Reviewed By: nateanl Differential Revision: D45965739 Pulled By: mthrok fbshipit-source-id: c167eb3ef347860a51efa906068fa2daa556f017
-
- 17 May, 2023 4 commits
-
-
moto authored
Summary: This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor. It changes two things; 1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy. 2. Get rid of intermediate UV plane copy The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values. Some observations * `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually. * switching from `interpolate` to manual data copy reduces the variance. run | main | 1 | 1+2 | improvement (from main to 1+2) -- | -- | -- | -- | -- 1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21% 2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05% 3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05% 4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19% 5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92% 6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49% 7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61% 8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55% 9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01% 10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81% [second] Increasing the resolution, the improvement is smaller but is consistent. run | main | 1+2 | improvement -- | -- | -- | -- 1 | 4.032393 | 3.991784667 | 1.01% 2 | 4.052248084 | 3.992672208 | 1.47% 3 | 4.07705575 | 4.000541666 | 1.88% 4 | 4.143954792 | 4.020671584 | 2.98% 5 | 4.170711959 | 4.025753125 | 3.48% 6 | 4.240229292 | 4.045504875 | 4.59% 7 | 4.267384042 | 4.045588125 | 5.20% 8 | 4.277025958 | 4.061980083 | 5.03% 9 | 4.312192042 | 4.163251959 | 3.45% 10 | 4.406109875 | 4.312560334 | 2.12% <details><summary>code</summary> ```python import time from torchaudio.io import StreamReader def test(): r = StreamReader(src="testsrc=duration=30", format="lavfi") # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi") r.add_video_stream(-1, filter_desc="format=yuv420p") t0 = time.monotonic() r.process_all_packets() elapsed = time.monotonic() - t0 print(elapsed) for _ in range(10): test() ``` </details> <details><summary>env</summary> ``` PyTorch version: 2.1.0.dev20230325 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 13.3.1 (arm64) GCC version: Could not collect Clang version: 14.0.6 CMake version: version 3.22.1 Libc version: N/A Python version: 3.9.16 (main, Mar 8 2023, 04:29:24) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-13.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M1 Versions of relevant libraries: [pip3] torch==2.1.0.dev20230325 [pip3] torchaudio==2.1.0a0+541b525 [conda] pytorch 2.1.0.dev20230325 py3.9_0 pytorch-nightly [conda] torchaudio 2.1.0a0+541b525 dev_0 <develop> ``` </details> Pull Request resolved: https://github.com/pytorch/audio/pull/3342 Reviewed By: xiaohui-zhang Differential Revision: D45947716 Pulled By: mthrok fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68 -
moto authored
Summary: Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion. It changes two things; - Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy. - Get rid of intermediate UV plane copy with 320x240 run | main | pr | improvement -- | -- | -- | -- 1 | 0.600671417 | 0.464993125 | 22.59% 2 | 0.638846084 | 0.456763542 | 28.50% 3 | 0.64158175 | 0.458295333 | 28.57% 4 | 0.649868584 | 0.455450583 | 29.92% 5 | 0.612171333 | 0.462435625 | 24.46% 6 | 0.6128095 | 0.456716166 | 25.47% 7 | 0.632084583 | 0.463357083 | 26.69% 8 | 0.610733083 | 0.46148625 | 24.44% 9 | 0.613825834 | 0.4559555 | 25.72% 10 | 0.653857458 | 0.455375375 | 30.36% [second] with 1080x720 video run | main | pr | improvement -- | -- | -- | -- 1 | 4.984154333 | 4.21090375 | 15.51% 2 | 4.988090625 | 4.239649375 | 15.00% 3 | 4.988896375 | 4.227277458 | 15.27% 4 | 4.998186584 | 4.161077042 | 16.75% 5 | 5.06180425 | 4.191672584 | 17.19% 6 | 5.108769667 | 4.198468458 | 17.82% 7 | 5.151363625 | 4.181942167 | 18.82% 8 | 5.199527875 | 4.239319084 | 18.47% 9 | 5.224903708 | 4.194901959 | 19.71% 10 | 5.333422583 | 4.320925792 | 18.98% [second] <details><summary>code</summary> ```python import time from torchaudio.io import StreamReader def test(): r = StreamReader(src="testsrc=duration=30", format="lavfi") # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi") r.add_video_stream(-1, filter_desc="format=nv12") t0 = time.monotonic() r.process_all_packets() elapsed = time.monotonic() - t0 print(elapsed) for _ in range(10): test() ``` </details> <details><summary>env</summary> ``` PyTorch version: 2.1.0.dev20230325 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 13.3.1 (arm64) GCC version: Could not collect Clang version: 14.0.6 CMake version: version 3.22.1 Libc version: N/A Python version: 3.9.16 (main, Mar 8 2023, 04:29:24) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-13.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M1 Versions of relevant libraries: [pip3] torch==2.1.0.dev20230325 [pip3] torchaudio==2.1.0a0+541b525 [conda] pytorch 2.1.0.dev20230325 py3.9_0 pytorch-nightly [conda] torchaudio 2.1.0a0+541b525 dev_0 <develop> ``` </details> Pull Request resolved: https://github.com/pytorch/audio/pull/3344 Reviewed By: xiaohui-zhang Differential Revision: D45948511 Pulled By: mthrok fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5
-
Carl Parker authored
Summary: Previously, `breadcrumbs.html` identified a nightly build version by the prefix "Nightly" which would normally be prepended to the version in `conf.py`. However, the version string is coming through without the "Nightly" prefix, so this change causes `breadcrumbs.html` to key on the substring "dev" instead. The reason we aren't getting "Nightly" is apparently because the environment variable BUILD_VERSION is available, so `conf.py` is using the value of that env var instead of the version string imported from the `torchaudio` module itself, which actually appears to be incorrect; see below. If I install torchaudio using conda install torchaudio -c pytorch-nightly then `torchaudio.__version__` returns the incorrect version string: 2.0.0.dev20230309 Pull Request resolved: https://github.com/pytorch/audio/pull/3333 Reviewed By: mthrok Differential Revision: D45926466 Pulled By: carljparker fbshipit-source-id: d5516f2d9f1716c2400d3e9b285bd5d32b4b3a77 -
moto authored
Summary: This commit add support to decode YUV420P010LE format. The image tensor returned by this format - NCHW format (C == 3) - int16 type - value range [0, 2^10). Note that the value range is different from what "hevc_cuvid" decoder returns. "hevc_cuvid" decoder uses full range of int16 (internally, it's uint16) to express the color (with some intervals), but the values returned by CPU "hevc" decoder are with in [0, 2^10). Address https://github.com/pytorch/audio/issues/3331 Pull Request resolved: https://github.com/pytorch/audio/pull/3332 Reviewed By: hwangjeff Differential Revision: D45925097 Pulled By: mthrok fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
-
- 16 May, 2023 3 commits
-
-
moto authored
Summary: This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4. FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5. Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg Pull Request resolved: https://github.com/pytorch/audio/pull/3298 Reviewed By: hwangjeff Differential Revision: D45865599 Pulled By: mthrok fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b
-
moto authored
Summary: TorchAudio has migrated CTC decoder to flashlight-text, and code related CTC decoder was removed in https://github.com/pytorch/audio/issues/3236. This commit cleans up the residual, removes the third party libraries used for CTC decoder, and mention to environment variable for CTC decoder. Pull Request resolved: https://github.com/pytorch/audio/pull/3339 Reviewed By: nateanl Differential Revision: D45920878 Pulled By: mthrok fbshipit-source-id: 8d93e64138697781570e5b0b1c9f86e1a7923a89
-
Amir Masoud Nourollah authored
Summary: A redundant "and" just removed. Pull Request resolved: https://github.com/pytorch/audio/pull/3334 Reviewed By: xiaohui-zhang Differential Revision: D45864314 Pulled By: mthrok fbshipit-source-id: ad67bde8fa73eac995fbd0d3809709cc38486884
-
- 15 May, 2023 1 commit
-
-
atalman authored
Summary: Switch windows nightly builds to GHA Similar to: https://github.com/pytorch/vision/pull/7578 Pull Request resolved: https://github.com/pytorch/audio/pull/3330 Reviewed By: mthrok Differential Revision: D45871892 Pulled By: atalman fbshipit-source-id: 817490a2abcaffceec5174c624f9e7d0377bbc4a
-
- 11 May, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3328 Make the `AVIOContext`-based constructor protected for better encapsulation. AVFormatContext and optional AVIOContext are managed by StreamReader/Writer, so it's better that they are abstracted away from client code. Reviewed By: hwangjeff Differential Revision: D45779629 fbshipit-source-id: 44c31e8af785447cb47aad0c44bf4ecf1aeebeaa
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3326 Reviewed By: hwangjeff Differential Revision: D45760678 Pulled By: mthrok fbshipit-source-id: 79b5d846c93516ca90c9700279124a9a04470242
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3325 Reviewed By: hwangjeff Differential Revision: D45759434 Pulled By: mthrok fbshipit-source-id: f3b1127fcf3b23beeab61fb7ff18f1b89b11ddc6
-
- 10 May, 2023 4 commits
-
-
moto authored
Summary: This commit makes the code defaults to the backend dispatcher by default. Enabling backend dispatcher puts the FFmpeg-based I/O implementation on higher priority (if the corresponding FFmpeg is available), and allows individual function call to specify the backend. See also https://github.com/pytorch/audio/issues/2950 Pull Request resolved: https://github.com/pytorch/audio/pull/3241 Reviewed By: hwangjeff Differential Revision: D44709068 Pulled By: mthrok fbshipit-source-id: 43aac3433f78a681df6669e9ac46e8ecf3beb1be
-
moto authored
Summary: https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/3226 Reviewed By: nateanl Differential Revision: D45402724 Pulled By: mthrok fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262
-
moto authored
Summary: This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241 Making FFmpeg backend default causes some issues on tutorials, so this commit disable it. The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change. Since it is necessary to mention the changes related to migration in the IO tutorial, I also update the IO documentation to include migration work so that it's easy to redirect. Pull Request resolved: https://github.com/pytorch/audio/pull/3285 Reviewed By: nateanl Differential Revision: D45671237 Pulled By: mthrok fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133
-
Zhaoheng Ni authored
Summary: Address https://github.com/pytorch/audio/issues/2643 - replace `SGD` optimization with `torch.linalg.lstsq` which is much faster. - Add autograd test for `InverseMelScale` - update other tests Pull Request resolved: https://github.com/pytorch/audio/pull/3280 Reviewed By: hwangjeff Differential Revision: D45679988 Pulled By: nateanl fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59
-
- 09 May, 2023 6 commits
-
-
moto authored
Summary: NumPy is an optional runtime dependency of TorchAudio, and it is not required at build time. Pull Request resolved: https://github.com/pytorch/audio/pull/3315 Reviewed By: nateanl Differential Revision: D45702243 Pulled By: mthrok fbshipit-source-id: 6ca6598931764c46be6323868e8cce7c8adc5024
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3296 Reviewed By: hwangjeff Differential Revision: D45503774 fbshipit-source-id: 806c22bd0f54fd0cea43d61ef3dbedd67ffeb012
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3320 Add StreamReaderCustomIO, which is analogous to StreamWriterCustomIO and which takes custom read/seek functions to fetch media data. Reviewed By: hwangjeff Differential Revision: D45482843 fbshipit-source-id: 3ccf771c0fdce153aaa7551053e9a77facedc983
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3319 * Merge the source with StreamWriter * Add docstrings * Move CustomIO to detail::CustomOutput to prepare for adding CustomInput Reviewed By: hwangjeff Differential Revision: D45481807 fbshipit-source-id: 4a9ac8a57acda47b126f8ae18e607b72919f9988
-
Zhaoheng Ni authored
Summary: The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`. Pull Request resolved: https://github.com/pytorch/audio/pull/3322 Reviewed By: mthrok Differential Revision: D45691769 Pulled By: nateanl fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045
-
Nikita Shulga authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3321 Reviewed By: atalman, mthrok Differential Revision: D45673225 Pulled By: malfet fbshipit-source-id: f2b915f3307ba95445702e3018254ad254fe2bb3
-