- 12 Jul, 2023 5 commits
-
-
moto authored
Summary: - FFmpeg 6 deprecated attributes - Guard CUDA specific functions not used in CPU builds Pull Request resolved: https://github.com/pytorch/audio/pull/3471 Differential Revision: D47402174 Pulled By: mthrok fbshipit-source-id: 00c0719ab1849b50c0b56b03d8fb38bc7aa74538
-
Bogdan Teleaga authored
Summary: This is a port of https://github.com/adefossez/julius/pull/17 for torchaudio. Not sure if it's possible/desirable to add tests to test the functionality of ONNX exports, but I did a quick test on my machine to ensure this works. The logic is a bit simpler compared to the other PR because the torchaudio version does not support the additional flags available in julius. Pull Request resolved: https://github.com/pytorch/audio/pull/3473 Differential Revision: D47401988 Pulled By: mthrok fbshipit-source-id: 62fa1e4388923f6a62cef2c0f902a79ea179cec4
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3475 Differential Revision: D47403772 Pulled By: mthrok fbshipit-source-id: 5cdde521dbbbbf33856470a9dc79419b4a3a1683
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3474 Differential Revision: D47398447 fbshipit-source-id: f77b685d54ddfc222b806475707d4a10239872f5
-
moto authored
Summary: This commit introduces support for multiple FFmpeg versions for OSS binary distributions. Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking. This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4. The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them. At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension. The order of preference is 6, 5, then 4. To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build. They are LGPL and downloaded from S3 at build time, instead of building every time. The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built so that it will only support one specific version of FFmpeg. Pull Request resolved: https://github.com/pytorch/audio/pull/3464 Differential Revision: D47300223 Pulled By: mthrok fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04
-
- 11 Jul, 2023 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3470 Differential Revision: D47374347 Pulled By: mthrok fbshipit-source-id: 003b83e50a70f6e1d06eb196f0be5dbba1640226
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3468 Differential Revision: D47368070 Pulled By: mthrok fbshipit-source-id: 9b5d57b0cb861a2556a1903121f526f8011a0e2d
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3469 Differential Revision: D47368140 Pulled By: mthrok fbshipit-source-id: d82ddb91ae1f6612298486fb8401f95c48db5620
-
moto authored
Summary: Now that we do not build FFmpeg as part of CI build process, we can remove the pre/post build scripts. Needs to land after https://github.com/pytorch/test-infra/pull/4358 Pull Request resolved: https://github.com/pytorch/audio/pull/3466 Reviewed By: atalman Differential Revision: D47367022 Pulled By: mthrok fbshipit-source-id: 17aafff74ee7d269236cffb8a88c803a8d4c44b7
-
- 10 Jul, 2023 1 commit
-
-
moto authored
Summary: 1. Update smoke test script to change directory so that there is no `torchaudio` directory in CWD when smoke test is being executed. 2. Disable the part of smoke test which requires FFmpeg for wheel. The preparation for https://github.com/pytorch/test-infra/pull/4358 Pull Request resolved: https://github.com/pytorch/audio/pull/3465 Reviewed By: nateanl Differential Revision: D47345117 Pulled By: mthrok fbshipit-source-id: 95aad0a22922d44ee9a24a05d9ece85166b8c17e
-
- 07 Jul, 2023 3 commits
-
-
moto authored
Summary: Similrt to https://github.com/pytorch/audio/issues/2949 Pull Request resolved: https://github.com/pytorch/audio/pull/3370 Differential Revision: D47298746 Pulled By: mthrok fbshipit-source-id: 0cc0f395772b33f8b2f5f55253d659e451f506c4
-
moto authored
Summary: - Add RGB0/BGR0 support to CPU encoder - Allow to pass RGB/BGR when expectged format is RGB0/BGR0 Pull Request resolved: https://github.com/pytorch/audio/pull/3428 Differential Revision: D47274370 Pulled By: mthrok fbshipit-source-id: d34d940e04b07673bb86f518fe895c0735912444
-
moto authored
Summary: This commit changes the way FFmpeg extension is built. Originally, the build process expected the FFmpeg binaries to be somehow available in build env. This makes the build process unpredictable and prevents default enabling FFmpeg extension. The proposed change uses pre-built FFmpeg binaries as build-time only scaffold, which are built in our CI job https://github.com/pytorch/audio/actions/workflows/ffmpeg.yml. This makes the build process more predictable and removes the necessity to build FFmpeg in our CI. Currently, it supports macOS (arm64, x86_64), unix (x86_64, aarch64) and windows (amd64). The downside is that it no longer works with the architecture not listed above. We can potentially workaround by searching the FFmpeg binaries available in system (the old way) for these system, but since they are not supported by PyTorch, the priority is low. Pull Request resolved: https://github.com/pytorch/audio/pull/3460 Differential Revision: D47261885 Pulled By: mthrok fbshipit-source-id: 223a15e95c9140c95688af968beb35ff40354476
-
- 06 Jul, 2023 2 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3462 Differential Revision: D47270241 Pulled By: mthrok fbshipit-source-id: 6a3b02380dfb381ffb47c1f46b46f4833c765246
-
moto authored
Summary: Follow up of https://github.com/pytorch/audio/pull/3455 FFMPEG_VERSION env ver is not defined in existing CI jobs. Pull Request resolved: https://github.com/pytorch/audio/pull/3459 Reviewed By: atalman Differential Revision: D47249074 Pulled By: mthrok fbshipit-source-id: 20f82d749adef5f45a984ab8125592ef36279e94
-
- 05 Jul, 2023 4 commits
-
-
moto authored
Summary: This reverts commit b7d3e89a. We will use pre-built binaries instead of dlopen. Pull Request resolved: https://github.com/pytorch/audio/pull/3456 Differential Revision: D47239681 Pulled By: mthrok fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3455 Differential Revision: D47242316 Pulled By: mthrok fbshipit-source-id: 0eb4bdb0a45fccfe9ff97eaed79db63cd7bfc7d8
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3457 Differential Revision: D47241343 Pulled By: mthrok fbshipit-source-id: fd1bfd1531397cb59e9cf11de9dede6949f8517e
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3433 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: mthrok Differential Revision: D46657526 fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89
-
- 03 Jul, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3434 Add one bullet point for `torchaudio.functional` and forced alignment as one example. Reviewed By: mthrok Differential Revision: D46658058 fbshipit-source-id: 6e037b7bb6ed2fc2e27ad1e55c5728c17ce69ce8
-
- 28 Jun, 2023 2 commits
-
-
Pingchuan Ma authored
Summary: Include Conformer/Emformer RNN-T ASR/VSR/AV-ASR link to index.rst Pull Request resolved: https://github.com/pytorch/audio/pull/3441 Differential Revision: D47094158 Pulled By: mthrok fbshipit-source-id: 9ab42ac2bf52a5ce488003897ffba2f10a6ca941
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449 Differential Revision: D47094402 Pulled By: mthrok fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622
-
- 26 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442 Differential Revision: D46797481 Pulled By: mthrok fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe
-
- 21 Jun, 2023 2 commits
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3427 Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms. Reviewed By: mthrok Differential Revision: D46547418 fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960
-
Xiaohui Zhang authored
Summary: Splitting the multilingual example part into another tutorial. Pull Request resolved: https://github.com/pytorch/audio/pull/3443 Reviewed By: mthrok Differential Revision: D46802844 Pulled By: xiaohui-zhang fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998
-
- 16 Jun, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset. This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability. Pull Request resolved: https://github.com/pytorch/audio/pull/3421 Reviewed By: mpc001 Differential Revision: D46799748 Pulled By: mthrok fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
-
- 15 Jun, 2023 1 commit
-
-
moto authored
Summary: * Fix backtrack visualization (the cooridnate was off-by-one.) * Add note about the simplification and the new align API * Explicitly handle SOS and EOS Pull Request resolved: https://github.com/pytorch/audio/pull/3440 Reviewed By: xiaohui-zhang Differential Revision: D46761282 Pulled By: mthrok fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8
-
- 14 Jun, 2023 1 commit
-
-
moto authored
Summary: Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate. Pull Request resolved: https://github.com/pytorch/audio/pull/3374 Differential Revision: D46235358 Pulled By: mthrok fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4
-
- 13 Jun, 2023 2 commits
-
-
Kyle Finn authored
Summary: This plugin pulls glib and gtk which breaks the build on some headless systems Since the plugin is not actually used, it seems right to disable it This change fixed the build on my system Pull Request resolved: https://github.com/pytorch/audio/pull/3436 Differential Revision: D46683297 Pulled By: mthrok fbshipit-source-id: 5b1c1eee1929f4a69a1cc6c7d7bb3ed998ec5872
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3435 Reviewed By: nateanl Differential Revision: D46659362 Pulled By: mthrok fbshipit-source-id: ffa033ad6759de6fd958b63ac51a4a1153ffb45d
-
- 12 Jun, 2023 1 commit
-
-
Chin-Yun Yu authored
Summary: Should resolve https://github.com/pytorch/audio/issues/3425 cc mthrok Pull Request resolved: https://github.com/pytorch/audio/pull/3432 Differential Revision: D46656180 Pulled By: mthrok fbshipit-source-id: 5c534bee2f143ef5cb5e50ec74828012dbcab7e9
-
- 09 Jun, 2023 3 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3422 Differential Revision: D46558184 Pulled By: mthrok fbshipit-source-id: a775c4fb193496d9b2bf9db7bee186ee23512b99
-
moto authored
Summary: The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test. See https://github.com/pytorch/audio/issues/3430 Pull Request resolved: https://github.com/pytorch/audio/pull/3431 Differential Revision: D46592883 Pulled By: mthrok fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6
-
moto authored
Summary: StreamWriter's encoding pipeline looks like the following 1. convert tensor to AVFrame 2. pass AVFrame to AVFilter 3. pass the resulting AVFrame to AVCodecContext (encoder) and AVFormatContext (muxer) When dealing with CUDA tensor, the AVFilter becomes no-op, as we have not added support for CUDA-compatible filters. When CUDA frame is passed, the existing solution passes the software pixel format to AVFilter, which issues warning later as what AVFilter sees is AV_PIX_FMT_CUDA. Since the filter itself is no-op, it functions as expected. But this commit fixes it. See https://github.com/pytorch/audio/issues/3317 Pull Request resolved: https://github.com/pytorch/audio/pull/3426 Differential Revision: D46562370 Pulled By: mthrok fbshipit-source-id: ce0131f1e50bcc826ee036fc0f35db2a5162b660
-
- 08 Jun, 2023 6 commits
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3395 Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`. Reviewed By: mthrok Differential Revision: D46307672 fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5
-
atalman authored
Summary: [Nova] Add cache ffmpeg before building - 2 Follow up after https://github.com/pytorch/audio/pull/3417, need to pass new arguments to test-infra workflows Pull Request resolved: https://github.com/pytorch/audio/pull/3423 Reviewed By: mthrok Differential Revision: D46559344 Pulled By: atalman fbshipit-source-id: fa5cccc3bfb052688de4a05cc3b4f37fcbe3a6f5
-
moto authored
Summary: StreamReader decoding process is composed of the three steps; 1. Decode the incoming AVPacket into AVFrame 2. Pass AVFrame through AVFilter to perform post process 3. Convert the resulgint AVFrame The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved. For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable. However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405 AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape. Fix https://github.com/pytorch/audio/issues/3405 Pull Request resolved: https://github.com/pytorch/audio/pull/3419 Differential Revision: D46557505 Pulled By: mthrok fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
-
moto authored
Summary: CI jobs are migrated from CCI to GHA Pull Request resolved: https://github.com/pytorch/audio/pull/3420 Differential Revision: D46548562 Pulled By: mthrok fbshipit-source-id: d7e17201e8b256efaa54543e445a0f139aa549b2
-
moto authored
Summary: - Moving the unit test scripts from .circleci to .github - Remove docker file for unit test base - Use the Conda from Docker image in Linux jobs. Remaining follow-up items - Reuse the unittest script in Linux GPU job like done in Linux CPU job. The unit test script needs to be fixed to be used for Linux GPU job in new GHA workflow. Keeping it as a separate follow-up work item. Pull Request resolved: https://github.com/pytorch/audio/pull/3407 Differential Revision: D46498263 Pulled By: mthrok fbshipit-source-id: d8256717a55bb4257151d819d3b2ebd453601eac
-
Kuba Rad authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3382 The voice activity detector function was unoptimized, confusingly written, and buggy. The optimizations created here allow for the function to run roughly 17x faster. The main optimizations were to loop over windows of audio rather than individual audio samples. Reducing the number of copies also helped. There was an off by one error where the array slice referenced was [1: 16001] (for the default settings) instead of [0: 16000] Reviewed By: hwangjeff Differential Revision: D44749359 fbshipit-source-id: c76c9412e70cdc6fcd527d113603c88f78480558
-