- 21 Mar, 2023 6 commits
-
-
Zhaoheng Ni authored
Summary: Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task. Pull Request resolved: https://github.com/pytorch/audio/pull/3189 Reviewed By: mthrok Differential Revision: D44267255 Pulled By: nateanl fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db
-
moto authored
Summary: To suppress local warning of flake8 <120 Pull Request resolved: https://github.com/pytorch/audio/pull/3191 Reviewed By: nateanl Differential Revision: D44263027 Pulled By: mthrok fbshipit-source-id: b3e48dba21fc5c9813f07e624a93f38a68956c6e
-
moto authored
Summary: oscillator_bank perform cumsum on large number of elements and typically, float32 is not good enough. This PR makes the cumsum operation default to float64, so that the result is better. Pull Request resolved: https://github.com/pytorch/audio/pull/3083 Reviewed By: nateanl Differential Revision: D44257182 Pulled By: mthrok fbshipit-source-id: a38a465d33559a415e8c744e61292f4fab64b0e1
-
moto authored
Summary: Fixes the issue https://app.circleci.com/pipelines/github/pytorch/audio/15501/workflows/ebaa2c87-efc3-44a8-b86d-5a3b99870588/jobs/1164478 Pull Request resolved: https://github.com/pytorch/audio/pull/3190 Reviewed By: nateanl Differential Revision: D44263564 Pulled By: mthrok fbshipit-source-id: e610be3a91888c859ebdc31081b2d1ba9d61737e
-
Zhaoheng Ni authored
Summary: In librosa 0.10 release, positional arguments are deprecated (see https://github.com/librosa/librosa/pull/1521 for details). The PR fixes the HiFiGAN unit test by using keyword arguments for `librosa.filters.mel` function. Pull Request resolved: https://github.com/pytorch/audio/pull/3185 Reviewed By: mthrok Differential Revision: D44218852 Pulled By: nateanl fbshipit-source-id: 6171f7bec6a2144917697c1d640e701d95ec60d7
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3188 Refactor the process after decoding in StreamRader. The post-decode process consists of three parts, 1. preprocessing using FilterGraph 2. conversion to Tensor 3. store in Buffer The FilterGraph class is a thin wrapper around AVFilterGraph structure from FFmpeg and it is agnostic to media type. However Tensor conversion and buffering consists of bunch of different logics. Currently, conversion process is abstracted away with template, i.e. `template<typename Conversion> Buffer`, and the whole process is implemeted in Sink class which consists of `FilterGraph` and `Buffer` which internally contains Conversion logic, even though conversion logic and buffer have nothing in common and beter logically separated. The new implementation replaces `Sink` class with `IPostDecodeProcess` interface, which contains the three components. The different post process is implemented as a template argument of the actual implementation, i.e. ```c++ template<typename Converter, typename Buffer> ProcessImpl : IPostDecodeProcess ``` and stored as `unique_ptr<IPostDecodeProcess>` on `StreamProcessor`. ([functionoid pattern](https://isocpp.org/wiki/faq/pointers-to-members#functionoids), which allows to eliminate all the branching based on the media format.) Note: This implementation was not possible at the initial version of StreamReader, as there was no way of knowing the media attributes coming out of `AVFilterGraph`. https://github.com/pytorch/audio/pull/3155 and https://github.com/pytorch/audio/pull/3183 added features to parse it properly, so we can finally make the post processing strongly-typed. Reviewed By: hwangjeff Differential Revision: D44242647 fbshipit-source-id: 96b8c6c72a2b8af4fa86a9b02292c65078ee265b
-
- 20 Mar, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3184 Tweak internals of StreamReader 1. Pass time_base to Buffer class so that * no need to pass frame_duration separately * Conversion of PTS to double type can be delayed until when it's popped 2. Merge `get_output_timebase` method into `get_output_stream_info`. 3. If filter description is not provided, fill in null filter at top-level StreamReader 4. Expose filer and filter description from Sink class to get rid of wrapper get methods. Reviewed By: nateanl Differential Revision: D44207976 fbshipit-source-id: f25ac9be69c9897e9dcec0c6e978f29b83b166e8
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3186 Fix the GPU memory leak introduced in https://github.com/pytorch/audio/pull/3183 The HW frames context is owned by AVCodecContext. The removed `av_buffer_ref` call increased the ferenrence counting unnecessarily, and prevented AVCodecContext from feeing the resource. (Note: this ignores all push blocking failures!) Reviewed By: nateanl Differential Revision: D44231876 fbshipit-source-id: 9be2c33049dd02a3fa82a85271de7fb62e5b09ea
-
moto authored
Summary: This commit adds CUDA frame support to FilterGraph It initializes and attaches CUDA frames context to FilterGraph, so that CUDA frames can be processed in FilterGraph. As a result, it enables 1. CUDA filter support such as `scale_cuda` 2. Properly retrieve the pixel format coming out of FilterGraph when CUDA HW acceleration is enabled. (currently it is reported as "cuda") Resolves https://github.com/pytorch/audio/issues/3159 Pull Request resolved: https://github.com/pytorch/audio/pull/3183 Reviewed By: hwangjeff Differential Revision: D44183722 Pulled By: mthrok fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f
-
- 17 Mar, 2023 4 commits
-
-
moto authored
Summary: TODO: add cache release Pull Request resolved: https://github.com/pytorch/audio/pull/3178 Reviewed By: hwangjeff Differential Revision: D44136275 Pulled By: mthrok fbshipit-source-id: 4eaf646fe17a469e8bbbdf43441d5532f9f8461d
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3181 Reviewed By: nateanl Differential Revision: D44167788 Pulled By: mthrok fbshipit-source-id: 375293df836456adc40020d323efbc0aebc60d83
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3182 Reviewed By: nateanl Differential Revision: D44167810 Pulled By: mthrok fbshipit-source-id: 6ecbae54224ef7ba32835e4006aa5f2dc16b9acb
-
moto authored
Summary: Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level. Pull Request resolved: https://github.com/pytorch/audio/pull/3179 Pull Request resolved: https://github.com/pytorch/audio/pull/3164 Reviewed By: mthrok Differential Revision: D43861413 Pulled By: hwangjeff fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161
-
- 16 Mar, 2023 2 commits
-
-
jiyuntu-eero authored
Summary: Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`. Pull Request resolved: https://github.com/pytorch/audio/pull/3172 Reviewed By: mthrok Differential Revision: D44090889 Pulled By: nateanl fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264
-
moto authored
Summary: Currently, when the Buffer converts AVFrame* to torch::Tensor, it checks the format at each time a frame is passed, and perform the conversion. This commit changes it so that the conversion operation is pre-instantiated at the time outside stream is configured. It introduces Converter implementations for various formats, and use template to embed them in Buffer class. This way, branching like if/switch are eliminated from decoding path. Pull Request resolved: https://github.com/pytorch/audio/pull/3170 Reviewed By: xiaohui-zhang Differential Revision: D44048293 Pulled By: mthrok fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
-
- 15 Mar, 2023 2 commits
-
-
Carl Parker authored
Summary: - Boldface the version-selection UX and increase size by three percent. - Add text to breadcrumbs to indicate version and stability. - New `breadcrumbs.html` in `_templates` overrides Sphinx version. I create a new variable in `conf.py`, **version_stable**, which has the version number for the most-recent stable release. I define this variable in the **html_context** dictionary so that it is visible to the templates. I use this approach because I was not able to find any other way of discerning the current stable release during the build. Note that the `versions.html` file--which identifies the current stable release--appears to be available only in the **gh-pages** branch and so it is not available at build time. However, this means that someone will need to update `conf.py` whenever the current stable release changes. Pull Request resolved: https://github.com/pytorch/audio/pull/3167 Reviewed By: mthrok Differential Revision: D44112224 Pulled By: carljparker fbshipit-source-id: e76f5cb6734a784d161342964459577aa9b64cac
-
Zhaoheng Ni authored
Summary: Autograd test randomly fails for MFCC transform. Fix it by increasing `nondet_tol` to `1e-10`. Pull Request resolved: https://github.com/pytorch/audio/pull/3169 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D44069673 Pulled By: nateanl fbshipit-source-id: addafefe381104e778b09bfbaafb322df1d9054c
-
- 14 Mar, 2023 2 commits
-
-
hwangjeff authored
Summary: Adds documentation that introduces forthcoming I/O backend revision and provides enablement directions for the current release. Doc pages: https://output.circle-artifacts.com/output/job/9c0e5a49-eaf4-404c-b910-ca1b18bb289b/artifacts/0/docs/torchaudio.html Pull Request resolved: https://github.com/pytorch/audio/pull/3147 Reviewed By: mthrok Differential Revision: D43824019 Pulled By: hwangjeff fbshipit-source-id: ad21d60c7e8f69f64859c56a8ca75735ddc22e40
-
Zhaoheng Ni authored
Summary: Add `2.0.0` release to the compatibility matrix Pull Request resolved: https://github.com/pytorch/audio/pull/3168 Reviewed By: mthrok Differential Revision: D44059197 Pulled By: nateanl fbshipit-source-id: a2830d059be90eddeab72b30e85cdfc393369bf8
-
- 09 Mar, 2023 2 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3157 AVCodecContext plays central role in decoding and encoding. Currently in StreamReader, the object is owned inside of Decoder class and it's not accessible from other objects. This commit move the ownership of AVCodecContext out of Decoder to StreamProcessor class so that other components can check access its field. Also, the Decoder class, which is super thin wrapper around AVCodecContext object, is now absorbed to StreamProcessor class. Reviewed By: xiaohui-zhang Differential Revision: D43924664 fbshipit-source-id: e53254955d9ce16871e393bcd8bb2794ce6a51ff
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3156 Remove helper methods that are not worthy of being private method Reviewed By: xiaohui-zhang Differential Revision: D43919385 fbshipit-source-id: 2ce4efaf5ec9418076e78c7ce1f842e0dd7e3028
-
- 08 Mar, 2023 3 commits
-
-
cai525 authored
Summary: Address #3101. The documentation for `power=1` should represent magnitude instead of energy. Pull Request resolved: https://github.com/pytorch/audio/pull/3134 Reviewed By: mthrok Differential Revision: D43910652 Pulled By: nateanl fbshipit-source-id: e0768438e819222a5dde6b86c5123ab0e8af59fb
-
moto authored
Summary: This commit adds fields to OutputStream, which shows the result of fitlers, such as width and height after filtering. Before ``` OutputStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray') ``` After ``` OutputVideoStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0) ``` Pull Request resolved: https://github.com/pytorch/audio/pull/3155 Reviewed By: nateanl Differential Revision: D43882399 Pulled By: mthrok fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d -
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3135 Reviewed By: xiaohui-zhang Differential Revision: D43724273 Pulled By: mthrok fbshipit-source-id: 9b52823618948945a26e57d5b3deccbf5f9268c1
-
- 07 Mar, 2023 5 commits
-
-
moto authored
Summary: FFmpeg 5 introduced a new API for channel configuration and channel_layout is deprecated. This commit fixes one of the deprecated messages. Pull Request resolved: https://github.com/pytorch/audio/pull/3149 Reviewed By: nateanl Differential Revision: D43874808 Pulled By: mthrok fbshipit-source-id: 3e76e8c8f1f34758b1014a426e77260e663b18ee
-
Zhaoheng Ni authored
Summary: `filtfilt` function uses `lfilter`, which calls `conv_1d` operation internally. `conv_1d` is expected to have autograd test failures (see https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html). The PR uses deterministic algorithms in the autograd tests to make `filtfilt` related tests pass. Pull Request resolved: https://github.com/pytorch/audio/pull/3150 Reviewed By: mthrok Differential Revision: D43872977 Pulled By: nateanl fbshipit-source-id: c3d6ec281f34db8a7092526ccb245797bf2338da
-
Zhaoheng Ni authored
Summary: Autograd test randomly failed on gpu linux machine. Increase `nondet_tol` to make it pass. Pull Request resolved: https://github.com/pytorch/audio/pull/3154 Reviewed By: mthrok Differential Revision: D43873028 Pulled By: nateanl fbshipit-source-id: a6668c47967a085e5eafb00e2dd4e61b2b46412e
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3152 In StreamWriter, if the destination is not opened when attempting to write data, it causes segmentation fault. This commit adds guard so that instead of segfault, it will error-out. Reviewed By: nateanl Differential Revision: D43852649 fbshipit-source-id: aef5db7c1508f8a7db5834c2ab6de3cad09f9d60
-
Maciej Torhan authored
Summary: In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer. Pull Request resolved: https://github.com/pytorch/audio/pull/3145 Reviewed By: mthrok Differential Revision: D43847713 Pulled By: nateanl fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6
-
- 06 Mar, 2023 1 commit
-
-
Moto Hira authored
Summary: After the series of simplification, audio/video encoding processes can be merged, and it allows the gets rid of the boilerplate code. Pull Request resolved: https://github.com/pytorch/audio/pull/3146 (Note: this ignores all push blocking failures!) Reviewed By: xiaohui-zhang Differential Revision: D43815640 fbshipit-source-id: 2a14e372b2cc75db7eeabc27d855a24c3f7d5063
-
- 04 Mar, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: Environment variable `TORCHAUDIO_TEST_ALLOW_SKIP_IF_NO_MACOS ` needs to be added when running the bash script Pull Request resolved: https://github.com/pytorch/audio/pull/3144 Reviewed By: mthrok Differential Revision: D43807178 Pulled By: nateanl fbshipit-source-id: 27c57d2efaed5519a12aa027967968895f357c67
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3143 Similar to https://github.com/pytorch/audio/pull/3140, only provide objects which are semantically related to the operation performed by AudioConverter. Reviewed By: xiaohui-zhang Differential Revision: D43781012 fbshipit-source-id: 4795e20f56272af5cfda8a5f46083e60d1890c3e
-
- 03 Mar, 2023 3 commits
-
-
moto authored
Summary: hw_device_ctx and hw_frame_ctx assigned to an AVCodecContext object are owned by libavformat, and get freed in [av_codec_free](https://ffmpeg.org/doxygen/4.1/group__lavc__core.html#gaf869d0829ed607cec3a4a02a1c7026b3) (actually in [avcodec_close](https://ffmpeg.org/doxygen/4.1/libavcodec_2utils_8c_source.html#l01069)), so we do not need to keep the reference around. Pull Request resolved: https://github.com/pytorch/audio/pull/3138 Reviewed By: nateanl Differential Revision: D43738009 Pulled By: mthrok fbshipit-source-id: 8c1f4217fa7b21dce872d12be9245056f3fc7537
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3140 https://github.com/pytorch/audio/pull/3120 introduced regression in GPU encoder. This happened because previously source AVPixelFormat (expected channel order of input tensor) and AVCodecContext (encoding format) in converter (module to copy input tensor to buffer), even though converter does not need to konw about the encoding format. This commit fixes the issue and make sure that converter does not recieve codec context. Reviewed By: nateanl Differential Revision: D43759162 fbshipit-source-id: f5f191cb54ecc82bd882aececdcae16921250261
-
Zhaoheng Ni authored
Summary: `playback` function was added in https://github.com/pytorch/audio/issues/3026, the function only supports MacOS, hence the tests should be skipped on other OS. The PR skips the tests on linux gpu machines on Circle CI. Pull Request resolved: https://github.com/pytorch/audio/pull/3141 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43760546 Pulled By: nateanl fbshipit-source-id: 606907127feee28a66f61baca000a8ef708f8086
-
- 02 Mar, 2023 5 commits
-
-
moto authored
Summary: Follow-up https://github.com/pytorch/audio/issues/3130 Pull Request resolved: https://github.com/pytorch/audio/pull/3136 Reviewed By: hwangjeff Differential Revision: D43732991 Pulled By: mthrok fbshipit-source-id: 2e8cb56d96e22546645c82eca362b3c4dcf9c78f
-
moto authored
Summary: Fix build_doc job https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8 - build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL. - Fix bash cell syntax in HW tutorial - Fix C++ doc - Fix duplicated target name in streamwriter tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/3125 Reviewed By: xiaohui-zhang Differential Revision: D43724078 Pulled By: mthrok fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3130 Similar to https://github.com/pytorch/audio/pull/3120 Adopt the generator style slicing conversion to audio encoding process. Reviewed By: nateanl Differential Revision: D43685380 fbshipit-source-id: 3e95655783e5c5d768486f8af6e6b47b0072999b
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3131 In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable is removed. PTS can be incremented the same way, but the timing was wrong in #3122. This commit fixes it. Reviewed By: xiaohui-zhang Differential Revision: D43712046 fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3129 - Add step parameter to support audio slicing - Rename to `SlicingTensorConverter` (`Generator` is too generic.) Reviewed By: xiaohui-zhang Differential Revision: D43704926 fbshipit-source-id: c4bf0ff766e0ae1b5d46b159a6367492ef68f9cd
-