- 07 Mar, 2023 1 commit
-
-
Maciej Torhan authored
Summary: In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer. Pull Request resolved: https://github.com/pytorch/audio/pull/3145 Reviewed By: mthrok Differential Revision: D43847713 Pulled By: nateanl fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6
-
- 06 Mar, 2023 1 commit
-
-
Moto Hira authored
Summary: After the series of simplification, audio/video encoding processes can be merged, and it allows the gets rid of the boilerplate code. Pull Request resolved: https://github.com/pytorch/audio/pull/3146 (Note: this ignores all push blocking failures!) Reviewed By: xiaohui-zhang Differential Revision: D43815640 fbshipit-source-id: 2a14e372b2cc75db7eeabc27d855a24c3f7d5063
-
- 04 Mar, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: Environment variable `TORCHAUDIO_TEST_ALLOW_SKIP_IF_NO_MACOS ` needs to be added when running the bash script Pull Request resolved: https://github.com/pytorch/audio/pull/3144 Reviewed By: mthrok Differential Revision: D43807178 Pulled By: nateanl fbshipit-source-id: 27c57d2efaed5519a12aa027967968895f357c67
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3143 Similar to https://github.com/pytorch/audio/pull/3140, only provide objects which are semantically related to the operation performed by AudioConverter. Reviewed By: xiaohui-zhang Differential Revision: D43781012 fbshipit-source-id: 4795e20f56272af5cfda8a5f46083e60d1890c3e
-
- 03 Mar, 2023 3 commits
-
-
moto authored
Summary: hw_device_ctx and hw_frame_ctx assigned to an AVCodecContext object are owned by libavformat, and get freed in [av_codec_free](https://ffmpeg.org/doxygen/4.1/group__lavc__core.html#gaf869d0829ed607cec3a4a02a1c7026b3) (actually in [avcodec_close](https://ffmpeg.org/doxygen/4.1/libavcodec_2utils_8c_source.html#l01069)), so we do not need to keep the reference around. Pull Request resolved: https://github.com/pytorch/audio/pull/3138 Reviewed By: nateanl Differential Revision: D43738009 Pulled By: mthrok fbshipit-source-id: 8c1f4217fa7b21dce872d12be9245056f3fc7537
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3140 https://github.com/pytorch/audio/pull/3120 introduced regression in GPU encoder. This happened because previously source AVPixelFormat (expected channel order of input tensor) and AVCodecContext (encoding format) in converter (module to copy input tensor to buffer), even though converter does not need to konw about the encoding format. This commit fixes the issue and make sure that converter does not recieve codec context. Reviewed By: nateanl Differential Revision: D43759162 fbshipit-source-id: f5f191cb54ecc82bd882aececdcae16921250261
-
Zhaoheng Ni authored
Summary: `playback` function was added in https://github.com/pytorch/audio/issues/3026, the function only supports MacOS, hence the tests should be skipped on other OS. The PR skips the tests on linux gpu machines on Circle CI. Pull Request resolved: https://github.com/pytorch/audio/pull/3141 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43760546 Pulled By: nateanl fbshipit-source-id: 606907127feee28a66f61baca000a8ef708f8086
-
- 02 Mar, 2023 5 commits
-
-
moto authored
Summary: Follow-up https://github.com/pytorch/audio/issues/3130 Pull Request resolved: https://github.com/pytorch/audio/pull/3136 Reviewed By: hwangjeff Differential Revision: D43732991 Pulled By: mthrok fbshipit-source-id: 2e8cb56d96e22546645c82eca362b3c4dcf9c78f
-
moto authored
Summary: Fix build_doc job https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8 - build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL. - Fix bash cell syntax in HW tutorial - Fix C++ doc - Fix duplicated target name in streamwriter tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/3125 Reviewed By: xiaohui-zhang Differential Revision: D43724078 Pulled By: mthrok fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3130 Similar to https://github.com/pytorch/audio/pull/3120 Adopt the generator style slicing conversion to audio encoding process. Reviewed By: nateanl Differential Revision: D43685380 fbshipit-source-id: 3e95655783e5c5d768486f8af6e6b47b0072999b
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3131 In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable is removed. PTS can be incremented the same way, but the timing was wrong in #3122. This commit fixes it. Reviewed By: xiaohui-zhang Differential Revision: D43712046 fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3129 - Add step parameter to support audio slicing - Rename to `SlicingTensorConverter` (`Generator` is too generic.) Reviewed By: xiaohui-zhang Differential Revision: D43704926 fbshipit-source-id: c4bf0ff766e0ae1b5d46b159a6367492ef68f9cd
-
- 01 Mar, 2023 6 commits
-
-
Zhaoheng Ni authored
Summary: `Dict` is not used. Fix styecheck by removing the import of `Dict`. Pull Request resolved: https://github.com/pytorch/audio/pull/3126 Reviewed By: mthrok Differential Revision: D43699410 Pulled By: nateanl fbshipit-source-id: 8d6b5335124903453387c488f96f297d6fe3c819
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3122 - Remove manual tracking of num_frames - Remove unnecessary dispatch in AudioOutputStream Reviewed By: nateanl Differential Revision: D43685746 fbshipit-source-id: a7e62a81549fb62ad0caa3b741655eba3bc5e250
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3120 This commits extract image conversion ops into ImageTensorConverter class, and make it independent from OutputStream class. ImageTensorConverter class implementes range-based for-loop interface, like ``` for (auto const& frame : ImageTensorConverter::convert(...)) { post_process_with_avframe(frame); } ``` This allows to decouple encoder from image conversion. Reviewed By: nateanl Differential Revision: D43666296 fbshipit-source-id: 754efe677bc7695b3f138a6d076be2106e186b79
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3123 Moving the I/O usage logging to C++, so that C++ usages are also covered. Reviewed By: nateanl Differential Revision: D43686567 fbshipit-source-id: ad357028dd69eedb8bc2a2482fe07e95757a3a62
-
Zhaoheng Ni authored
Summary: `sox` is not available on Windows machines. Add skip decorators to the sox related tests to skip running tests on Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/3119 Reviewed By: mthrok Differential Revision: D43682754 Pulled By: nateanl fbshipit-source-id: f69987dac8232a3569be83f096b32389bd8bda81
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3121 After careful review, it turned out device arg in VideoOutputStream constructor and related helper functions can be replaced with AVCodecContext::pix_fmt == AV_PIX_FMT_CUDA. Reviewed By: xiaohui-zhang Differential Revision: D43677801 fbshipit-source-id: f8f34f1aed46e223b44250d39cccc4cd26ecb458
-
- 28 Feb, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3113 Decouple the Tensor to AVFrame conversion process from encoding process. Reviewed By: nateanl Differential Revision: D43628942 fbshipit-source-id: e698f3150292567dbc23e7d6795ad58265f24780
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3109 Change the logic around StreamWriter preprocessing. Currently, no preprocessing is expressed as `nullptr` to `unique_ptr<FilterGraph>`. This commit changes it to `[a]null` filter, which is just a pass through. This makes a code a bit simpler, and serves better preparation for adding filters for CUDA process. Reviewed By: xiaohui-zhang Differential Revision: D43593321 fbshipit-source-id: 9ca71c2c8bf652384a0f56b4c41b32d908f61201
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3108 - Introduce process_frame method - De-dupe validation logic Reviewed By: xiaohui-zhang Differential Revision: D43632390 fbshipit-source-id: 76b7ca0beb725acf686269c877a62e1256921b28
-
- 27 Feb, 2023 5 commits
-
-
Zhaoheng Ni authored
Summary: Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/3103 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43611794 Pulled By: nateanl fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3105 Refactor the construction of Audio/VideoOutputStream Reviewed By: nateanl Differential Revision: D43613013 fbshipit-source-id: 0e112cb1bab2658be68a368099ed00ef318ea4f1
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3106 Refactor Audio/VideoOutputStream. Reviewed By: nateanl Differential Revision: D43613008 fbshipit-source-id: 36c62fe00903066982573866d07de4e79b34240d
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3104 Continuation of StreamWriter refactoring This commit extract Encoder (+muxer) from OutputStream Reviewed By: nateanl Differential Revision: D43610887 fbshipit-source-id: 30a9862b1aabd5af331ce3f33a5815df1decbad1
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3100 Refactor StreamWriter and move OutputStream to dedicated source, then split them into separate audio/video class. Reviewed By: nateanl Differential Revision: D43587337 fbshipit-source-id: 0fdbd1f56a7200dc6849e95eb9678854f5d933b8
-
- 25 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3099 Reviewed By: mthrok Differential Revision: D43596866 Pulled By: nateanl fbshipit-source-id: 43a139bf8ebdf3261414e2855aefc3b53df298ac
-
- 24 Feb, 2023 5 commits
-
-
Vladislav Agafonov authored
Summary: Add `Wav2Vec2DataModule` in self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3081 Reviewed By: mthrok Differential Revision: D43579239 Pulled By: nateanl fbshipit-source-id: 3e935eb9a18ef0259a58940ae466cbdc3baf8494
-
Vladislav Agafonov authored
Summary: Add wav2vec2 loss function in the self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3090 Reviewed By: mthrok Differential Revision: D43579220 Pulled By: nateanl fbshipit-source-id: 4b52792b518ddc5b01c9660c90ceb3c4ad1f0237
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3095 Reviewed By: nateanl Differential Revision: D43544998 Pulled By: mthrok fbshipit-source-id: 4359cdbbdbee53084016a84129cb3d65900b0457
-
moto authored
Summary: This commit is kind of clean up and preparation for future development. We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we use PyBind11 for binding StreamWriter. Pull Request resolved: https://github.com/pytorch/audio/pull/3091 Reviewed By: xiaohui-zhang Differential Revision: D43515714 Pulled By: mthrok fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3084 Reviewed By: mthrok Differential Revision: D43550150 Pulled By: nateanl fbshipit-source-id: 5c5e3d9461e375be202493e3399ff38ce5cd7690
-
- 23 Feb, 2023 5 commits
-
-
moto authored
Summary: This commit is kind of clean up and preparation for future development. We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we want to use PyBind11 for binding StreamReader/Writer. PyBind11 converts Python dict into std::map, while TorchBind converts it into c10::Dict. Because of this descrepancy, conversion from c10::Dict to std::map have to happen in multiple places, and this makes the binding code thicker as it requires to wrapper methods. Using std::map reduces the number of wrapper methods / conversions, because the same method can be bound for file-like object and the others. Pull Request resolved: https://github.com/pytorch/audio/pull/3092 Reviewed By: nateanl Differential Revision: D43524808 Pulled By: mthrok fbshipit-source-id: f7467c66ccd37dbf4abc337bbb18ffaac21a0058
-
G. Sun authored
Summary: This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing. An example for Librispeech can be found in audio/examples/asr/librispeech_biasing. Maintainer's note (mthrok): It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple could cause some issue without running the code, so the code is not changed, though the annotation uses tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2890 Reviewed By: nateanl Differential Revision: D43171447 Pulled By: mthrok fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e
-
mthrok authored
Summary: Remove the Tensor input support from StreamReader Follow up of https://github.com/pytorch/audio/pull/3086 Pull Request resolved: https://github.com/pytorch/audio/pull/3093 Reviewed By: xiaohui-zhang Differential Revision: D43526066 Pulled By: mthrok fbshipit-source-id: 57ba4866c413649173e1c2c3b23ba7de3231b7bc
-
moto authored
Summary: The same functionality can be achieved with passing io.BytesIO to the constructor. Pull Request resolved: https://github.com/pytorch/audio/pull/3086 Reviewed By: nateanl Differential Revision: D43500360 Pulled By: mthrok fbshipit-source-id: 2c6f37d100f50553b283c75c04fe57c8f9c07dc9
-
moto authored
Summary: 1. Fix spacing. 2. Move it to after successful import 3. Add link to the announcement issue Pull Request resolved: https://github.com/pytorch/audio/pull/3089 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D43514075 Pulled By: mthrok fbshipit-source-id: 3b2a24c65c63dab8c12c9c6aa1942a8354b2c0f1
-
- 22 Feb, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3087 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43509865 Pulled By: nateanl fbshipit-source-id: 569cc2ee8edd9de0b7d255a1e1075ac812b26cc8
-
Zhaoheng Ni authored
Summary: The negative sampling should be applied to unmasked features in masked indices, the PR fixes the logic in ConformerWav2Vec2PretrainModel. Pull Request resolved: https://github.com/pytorch/audio/pull/3085 Reviewed By: mthrok Differential Revision: D43488570 Pulled By: nateanl fbshipit-source-id: 3820400d50b74216bb98ca6a40dc6a7acca01564
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3042 Reviewed By: mthrok Differential Revision: D43405932 Pulled By: nateanl fbshipit-source-id: 88f6dabae35565b699230e9909b8f68f4a57f5c7
-