- 01 Apr, 2023 1 commit
-
-
moto authored
Summary: This commit adds a new feature AudioEffector, which can be used to apply various effects and codecs to waveforms in Tensor. Under the hood it uses StreamWriter and StreamReader to apply filters and encode/decode. This is going to replace the deprecated `apply_codec` and `apply_sox_effect_tensor` functions. It can also perform online, chunk-by-chunk filtering. Tutorial to follow. closes https://github.com/pytorch/audio/issues/3161 Pull Request resolved: https://github.com/pytorch/audio/pull/3163 Reviewed By: hwangjeff Differential Revision: D44576660 Pulled By: mthrok fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
-
- 31 Mar, 2023 3 commits
-
-
Nouran Ali authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3222 Reviewed By: nateanl Differential Revision: D44539424 Pulled By: mthrok fbshipit-source-id: 8fbcb5f9918c9930c939bcd448493fa5cf604545
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3223 Each `StreamProcessor` is responsible for processing a source stream. In the case where we support packet passthrough, `StreamProcessor`'s choice of decoder is irrelevant as no decoding is performed. Currently, however, `StreamProcessor` requires decoder params and fixes a decoder at construction time. To accommodate this future packet passthrough use case, this PR decouples the construction of `StreamProcessor` from the configuration of the decoder that it uses. Reviewed By: mthrok Differential Revision: D44554934 fbshipit-source-id: 1d1a89015e1181b71dfb95c928de4fc3ec6f63b6
-
moto authored
Summary: This commit adds the equivalent of `qscale` option in FFmpeg to StreamWriter.CodecConfig. `qscale` enables variable bit rate. The following figure illustrates the difference between currently available configs. From top to bottom; original, `compression_level=1`, `compression_level=9`, `bit_rate=192k`, `bit_rate=8k`, `qscale=9`, `qscale=1`.  Pull Request resolved: https://github.com/pytorch/audio/pull/3224 Reviewed By: hwangjeff Differential Revision: D44563633 Pulled By: mthrok fbshipit-source-id: ff74cd803b5abf1222f087e3e46ba7d81a35f672
-
- 30 Mar, 2023 2 commits
-
-
moto authored
Summary: This commit adds support for changing the spec of media (such as sample rate, #channels, image size and frame rate) on-the-fly at encoding time. The motivation behind this addition is that certain media formats support only limited number of spec, and it is cumbersome to require client code to change the spec every time. For example, OPUS supports only 48kHz sampling rate, and vorbis only supports stereo. To make it easy to work with media of different formats, this commit makes it so that anything that's not compatible with the format is automatically converted, and allows users to specify the override. Notable implementation detail is that, for sample format and pixel format, the default value of encoder has higher precedent to source value, while for other attributes like sample rate and #channels, the source value has higher precedent as long as they are supported. Pull Request resolved: https://github.com/pytorch/audio/pull/3207 Reviewed By: nateanl Differential Revision: D44439622 Pulled By: mthrok fbshipit-source-id: 09524f201d485d201150481884a3e9e4d2aab081
-
moto authored
Summary: This commit adds `num_channels` argument, which allows one to change the number of channels on-the-fly. Pull Request resolved: https://github.com/pytorch/audio/pull/3216 Reviewed By: hwangjeff Differential Revision: D44516925 Pulled By: mthrok fbshipit-source-id: 3e5a11b3fdbb19071f712a8148e27aff60341df3
-
- 29 Mar, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3217 This commit removes some tests for file-like object from StreamWriter test. The rational is that testing things after the output file is opened are same for file-like object and regular files. Things like filter-graph and encoder format change does not affect how the encoded bynary are written. Reviewed By: hwangjeff Differential Revision: D44518626 fbshipit-source-id: 821ec20deca92e5e5c85bf4d47997eed51735374
-
moto authored
Summary: In https://github.com/pytorch/audio/issues/3178, a mechanism to cache HW device context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory. --- Q: What is HW device context? From https://ffmpeg.org/doxygen/4.1/structAVHWDeviceContext.html#details > This struct aggregates all the (hardware/vendor-specific) "high-level" state, i.e. > > state that is not tied to a concrete processing configuration. E.g., in an API that supports hardware-accelerated encoding and decoding, this struct will (if possible) wrap the state that is common to both encoding and decoding and from which specific instances of encoders or decoders can be derived. Pull Request resolved: https://github.com/pytorch/audio/pull/3215 Reviewed By: nateanl Differential Revision: D44504051 Pulled By: mthrok fbshipit-source-id: 77579cdc8bd9e9b8a218e3f29031d091cda83860
-
moto authored
Summary: There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it. Pull Request resolved: https://github.com/pytorch/audio/pull/3214 Reviewed By: nateanl Differential Revision: D44504030 Pulled By: mthrok fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef
-
- 28 Mar, 2023 4 commits
-
-
nateanl authored
Summary: Fix https://github.com/pytorch/audio/issues/3211 Pull Request resolved: https://github.com/pytorch/audio/pull/3212 Reviewed By: mthrok Differential Revision: D44472523 Pulled By: nateanl fbshipit-source-id: eb519b0045e7518ad13863a53271745a80d89a21
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3194 Reviewed By: hwangjeff Differential Revision: D44283910 Pulled By: mthrok fbshipit-source-id: 49125724896bf7190ec27f056b6bfef260019f8e
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3208 StreamReader/Writer is evolving and the number of arguments in add_stream methods are growing. This commit adds default values to these arguments. Reviewed By: hwangjeff Differential Revision: D44447263 fbshipit-source-id: e1c09956d78c2b4738bbeafb88195ec8e8ca5513
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3209 The previous code prints out the uninitialized variable when invalid HW acceleration is provided. This commit fixes it. Reviewed By: hwangjeff Differential Revision: D44449715 fbshipit-source-id: 8b76cfc27816d5ea9fbc2bc37a3148f09a8ed6ed
-
- 27 Mar, 2023 2 commits
-
-
hwangjeff authored
Summary: For `StreamWriter`, * Renames arg `config` to codec_config`. * Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`. * Adds docstrings for arg codec_config`. * Updates `chunk` to `frames` in `write_*_chunk` methods. Pull Request resolved: https://github.com/pytorch/audio/pull/3203 Reviewed By: mthrok Differential Revision: D44350153 Pulled By: hwangjeff fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3205 This commit refactors the initialization of EncodeProcess. Interface-wise, the signature of the constructor of EncodeProcess has made simpler just to take rvalues of its components, and the initialization of the components have been moved to helper functions. Implementat-wise, the order that the components are initialized is revised, and the source of initialization parameters is also revised. For example, the original implementation first creates AVCodecContext, and passes it around to create the other components. This relied on an assumption that parameters AVCodecContext has (such as image size and sample rate) are same as the source data. This is not always right, and as we will introduce custom filter graph and allow on-the-fly transform of rates and dimensions, it will become even less correct. The new initialization constructs source AVFrame, TensorConverter and FilterGraph from source attributes. This makes it easy to introduce on-the-fly transform. Reviewed By: nateanl Differential Revision: D44360650 fbshipit-source-id: bf0e77dc1a5a40fc8e9870c50d07339d812762e8
-
- 25 Mar, 2023 1 commit
-
-
moto authored
Summary: Some audio encoders expect specific, exact number of samples described as in `AVCodecContext.frame_size`. The `AVFrame.nb_samples` is set for the frames passed to `AVFilterGraph`, but frames coming out of the graph do not necessarily have the same numbr of frames. This causes issues with encoding OPUS (among others). This commit fixes it by inserting `asetnsamples` to filter graph if a fixed number of samples is requested. Note: It turned out that FFmpeg 4.1 has issue with OPUS encoding. It does not properly discard some sample. We should probably move the minimum required FFmpeg to 4.2, but I am not sure if we can enforce it via ABI. Work around will be to issue an warning if encoding OPUS with 4.1. (follow-up) Pull Request resolved: https://github.com/pytorch/audio/pull/3204 Reviewed By: nateanl Differential Revision: D44374668 Pulled By: mthrok fbshipit-source-id: 10ef5333dc0677dfb83c8e40b78edd8ded1b21dc
-
- 23 Mar, 2023 6 commits
-
-
Scott Wolchok authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3198 Fixes build after following diff to use F14 maps in pickler internally. Reviewed By: mthrok Differential Revision: D44098387 fbshipit-source-id: 9777517369d9a3f2599b273c04bf4a014f411f12
-
moto authored
Summary: With the support of CUDA filter in https://github.com/pytorch/audio/issues/3183, it is now possible to change the pixel format of CUDA frame. This commit adds conversion for YUV444P format. Pull Request resolved: https://github.com/pytorch/audio/pull/3199 Reviewed By: hwangjeff Differential Revision: D44323928 Pulled By: mthrok fbshipit-source-id: 6d9b205e7235df5f21e7d3e06166b3a169f1ae9f
-
Zhaoheng Ni authored
Summary: The PR adds the pre-trained pipeline for `SquimSubjective` model which predicts MOS score for speech enhancement task. Pull Request resolved: https://github.com/pytorch/audio/pull/3197 Reviewed By: mthrok Differential Revision: D44313244 Pulled By: nateanl fbshipit-source-id: 905095ff77006e9f441faa826fc25d9d8681e8aa
-
moto authored
Summary: StreamReader behaves differently when dealing with YUV formats. It implicitly converts the image format to YUV444P because otherwise image planes do not have the same shape and it is not possible to express it as a regular PyTorch Tensor with dedicated dimension for each color channel. This is commit adds warnings to such conversions. Pull Request resolved: https://github.com/pytorch/audio/pull/3201 Reviewed By: nateanl Differential Revision: D44311017 Pulled By: mthrok fbshipit-source-id: 73a02a19c013c0263f349e1f3a3603e3d3eddb6a
-
Zhaoheng Ni authored
Summary: In the nightly documentation, "Prototype Factory Functions of Beta Models" is listed as an individual section, which is not correct. <img width="310" alt="image" src="https://user-images.githubusercontent.com/8653221/227262349-604b99e8-1b20-4b19-9711-81e7b6cfa62e.png"> After the PR, the section outlook is fixed <img width="285" alt="image" src="https://user-images.githubusercontent.com/8653221/227262893-b938d81e-6c4b-432a-833c-95981bca5e65.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/3202 Reviewed By: mthrok Differential Revision: D44338663 Pulled By: nateanl fbshipit-source-id: 09f591b9e4af66ebf34fb423bd5c30d4630f0b88
-
moto authored
Summary: OPUS encoder and VORBIS encoders require "strict=experimental" flags. This commit enables it automatically. The rational behind of it is typically we care if we can encode these formats at all and not how they are encoded. (This might be concern when these encoder becomes more mature on FFmpeg side and providing flags would result in weird behavior) Also when writing high-level functions that uses StreamWriter, if we do not set these flags, then these high-level functions have to add new options that should be passed down to StreamWriter, which turned out to be very painful in https://github.com/pytorch/audio/issues/3163 Pull Request resolved: https://github.com/pytorch/audio/pull/3192 Reviewed By: nateanl Differential Revision: D44275089 Pulled By: mthrok fbshipit-source-id: 74a757b4b7fc8467c8c88ffcb54fbaf89d6e4384
-
- 22 Mar, 2023 2 commits
-
-
moto authored
Summary: Follow up of https://github.com/pytorch/audio/pull/3083 Pull Request resolved: https://github.com/pytorch/audio/pull/3196 Reviewed By: nateanl Differential Revision: D44308940 Pulled By: mthrok fbshipit-source-id: e3ef27656e74c28ae78b767517d8e0ba3a9ac4a6
-
atalman authored
Summary: Adopt ffmpeg build to be executed from github actions for windows Tested by manually invoking this script: ``` c:\actions-runner\_work\test-infra\test-infra\pytorch\audio Chocolatey v1.2.1 Installing the following packages: msys2 By installing, you accept licenses for the packages. msys2 v20230318.0.0 already installed. Use --force to reinstall, specify a version to install, or try upgrade. Chocolatey installed 0/1 packages. See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log). Warnings: - msys2 - msys2 v20230318.0.0 already installed. Use --force to reinstall, specify a version to install, or try upgrade. Did you know the proceeds of Pro (and some proceeds from other licensed editions) go into bettering the community infrastructure? Your support ensures an active community, keeps Chocolatey tip-top, plus it nets you some awesome features! https://chocolatey.org/compare warning: base-devel-...
-
- 21 Mar, 2023 6 commits
-
-
Zhaoheng Ni authored
Summary: Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task. Pull Request resolved: https://github.com/pytorch/audio/pull/3189 Reviewed By: mthrok Differential Revision: D44267255 Pulled By: nateanl fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db
-
moto authored
Summary: To suppress local warning of flake8 <120 Pull Request resolved: https://github.com/pytorch/audio/pull/3191 Reviewed By: nateanl Differential Revision: D44263027 Pulled By: mthrok fbshipit-source-id: b3e48dba21fc5c9813f07e624a93f38a68956c6e
-
moto authored
Summary: oscillator_bank perform cumsum on large number of elements and typically, float32 is not good enough. This PR makes the cumsum operation default to float64, so that the result is better. Pull Request resolved: https://github.com/pytorch/audio/pull/3083 Reviewed By: nateanl Differential Revision: D44257182 Pulled By: mthrok fbshipit-source-id: a38a465d33559a415e8c744e61292f4fab64b0e1
-
moto authored
Summary: Fixes the issue https://app.circleci.com/pipelines/github/pytorch/audio/15501/workflows/ebaa2c87-efc3-44a8-b86d-5a3b99870588/jobs/1164478 Pull Request resolved: https://github.com/pytorch/audio/pull/3190 Reviewed By: nateanl Differential Revision: D44263564 Pulled By: mthrok fbshipit-source-id: e610be3a91888c859ebdc31081b2d1ba9d61737e
-
Zhaoheng Ni authored
Summary: In librosa 0.10 release, positional arguments are deprecated (see https://github.com/librosa/librosa/pull/1521 for details). The PR fixes the HiFiGAN unit test by using keyword arguments for `librosa.filters.mel` function. Pull Request resolved: https://github.com/pytorch/audio/pull/3185 Reviewed By: mthrok Differential Revision: D44218852 Pulled By: nateanl fbshipit-source-id: 6171f7bec6a2144917697c1d640e701d95ec60d7
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3188 Refactor the process after decoding in StreamRader. The post-decode process consists of three parts, 1. preprocessing using FilterGraph 2. conversion to Tensor 3. store in Buffer The FilterGraph class is a thin wrapper around AVFilterGraph structure from FFmpeg and it is agnostic to media type. However Tensor conversion and buffering consists of bunch of different logics. Currently, conversion process is abstracted away with template, i.e. `template<typename Conversion> Buffer`, and the whole process is implemeted in Sink class which consists of `FilterGraph` and `Buffer` which internally contains Conversion logic, even though conversion logic and buffer have nothing in common and beter logically separated. The new implementation replaces `Sink` class with `IPostDecodeProcess` interface, which contains the three components. The different post process is implemented as a template argument of the actual implementation, i.e. ```c++ template<typename Converter, typename Buffer> ProcessImpl : IPostDecodeProcess ``` and stored as `unique_ptr<IPostDecodeProcess>` on `StreamProcessor`. ([functionoid pattern](https://isocpp.org/wiki/faq/pointers-to-members#functionoids), which allows to eliminate all the branching based on the media format.) Note: This implementation was not possible at the initial version of StreamReader, as there was no way of knowing the media attributes coming out of `AVFilterGraph`. https://github.com/pytorch/audio/pull/3155 and https://github.com/pytorch/audio/pull/3183 added features to parse it properly, so we can finally make the post processing strongly-typed. Reviewed By: hwangjeff Differential Revision: D44242647 fbshipit-source-id: 96b8c6c72a2b8af4fa86a9b02292c65078ee265b
-
- 20 Mar, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3184 Tweak internals of StreamReader 1. Pass time_base to Buffer class so that * no need to pass frame_duration separately * Conversion of PTS to double type can be delayed until when it's popped 2. Merge `get_output_timebase` method into `get_output_stream_info`. 3. If filter description is not provided, fill in null filter at top-level StreamReader 4. Expose filer and filter description from Sink class to get rid of wrapper get methods. Reviewed By: nateanl Differential Revision: D44207976 fbshipit-source-id: f25ac9be69c9897e9dcec0c6e978f29b83b166e8
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3186 Fix the GPU memory leak introduced in https://github.com/pytorch/audio/pull/3183 The HW frames context is owned by AVCodecContext. The removed `av_buffer_ref` call increased the ferenrence counting unnecessarily, and prevented AVCodecContext from feeing the resource. (Note: this ignores all push blocking failures!) Reviewed By: nateanl Differential Revision: D44231876 fbshipit-source-id: 9be2c33049dd02a3fa82a85271de7fb62e5b09ea
-
moto authored
Summary: This commit adds CUDA frame support to FilterGraph It initializes and attaches CUDA frames context to FilterGraph, so that CUDA frames can be processed in FilterGraph. As a result, it enables 1. CUDA filter support such as `scale_cuda` 2. Properly retrieve the pixel format coming out of FilterGraph when CUDA HW acceleration is enabled. (currently it is reported as "cuda") Resolves https://github.com/pytorch/audio/issues/3159 Pull Request resolved: https://github.com/pytorch/audio/pull/3183 Reviewed By: hwangjeff Differential Revision: D44183722 Pulled By: mthrok fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f
-
- 17 Mar, 2023 4 commits
-
-
moto authored
Summary: TODO: add cache release Pull Request resolved: https://github.com/pytorch/audio/pull/3178 Reviewed By: hwangjeff Differential Revision: D44136275 Pulled By: mthrok fbshipit-source-id: 4eaf646fe17a469e8bbbdf43441d5532f9f8461d
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3181 Reviewed By: nateanl Differential Revision: D44167788 Pulled By: mthrok fbshipit-source-id: 375293df836456adc40020d323efbc0aebc60d83
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3182 Reviewed By: nateanl Differential Revision: D44167810 Pulled By: mthrok fbshipit-source-id: 6ecbae54224ef7ba32835e4006aa5f2dc16b9acb
-
moto authored
Summary: Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level. Pull Request resolved: https://github.com/pytorch/audio/pull/3179 Pull Request resolved: https://github.com/pytorch/audio/pull/3164 Reviewed By: mthrok Differential Revision: D43861413 Pulled By: hwangjeff fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161
-
- 16 Mar, 2023 2 commits
-
-
jiyuntu-eero authored
Summary: Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`. Pull Request resolved: https://github.com/pytorch/audio/pull/3172 Reviewed By: mthrok Differential Revision: D44090889 Pulled By: nateanl fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264
-
moto authored
Summary: Currently, when the Buffer converts AVFrame* to torch::Tensor, it checks the format at each time a frame is passed, and perform the conversion. This commit changes it so that the conversion operation is pre-instantiated at the time outside stream is configured. It introduces Converter implementations for various formats, and use template to embed them in Buffer class. This way, branching like if/switch are eliminated from decoding path. Pull Request resolved: https://github.com/pytorch/audio/pull/3170 Reviewed By: xiaohui-zhang Differential Revision: D44048293 Pulled By: mthrok fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
-
- 15 Mar, 2023 1 commit
-
-
Carl Parker authored
Summary: - Boldface the version-selection UX and increase size by three percent. - Add text to breadcrumbs to indicate version and stability. - New `breadcrumbs.html` in `_templates` overrides Sphinx version. I create a new variable in `conf.py`, **version_stable**, which has the version number for the most-recent stable release. I define this variable in the **html_context** dictionary so that it is visible to the templates. I use this approach because I was not able to find any other way of discerning the current stable release during the build. Note that the `versions.html` file--which identifies the current stable release--appears to be available only in the **gh-pages** branch and so it is not available at build time. However, this means that someone will need to update `conf.py` whenever the current stable release changes. Pull Request resolved: https://github.com/pytorch/audio/pull/3167 Reviewed By: mthrok Differential Revision: D44112224 Pulled By: carljparker fbshipit-source-id: e76f5cb6734a784d161342964459577aa9b64cac
-