- 20 Mar, 2023 1 commit
-
-
moto authored
Summary: This commit adds CUDA frame support to FilterGraph It initializes and attaches CUDA frames context to FilterGraph, so that CUDA frames can be processed in FilterGraph. As a result, it enables 1. CUDA filter support such as `scale_cuda` 2. Properly retrieve the pixel format coming out of FilterGraph when CUDA HW acceleration is enabled. (currently it is reported as "cuda") Resolves https://github.com/pytorch/audio/issues/3159 Pull Request resolved: https://github.com/pytorch/audio/pull/3183 Reviewed By: hwangjeff Differential Revision: D44183722 Pulled By: mthrok fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f
-
- 17 Mar, 2023 1 commit
-
-
moto authored
Summary: Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level. Pull Request resolved: https://github.com/pytorch/audio/pull/3179 Pull Request resolved: https://github.com/pytorch/audio/pull/3164 Reviewed By: mthrok Differential Revision: D43861413 Pulled By: hwangjeff fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161
-
- 16 Mar, 2023 1 commit
-
-
moto authored
Summary: Currently, when the Buffer converts AVFrame* to torch::Tensor, it checks the format at each time a frame is passed, and perform the conversion. This commit changes it so that the conversion operation is pre-instantiated at the time outside stream is configured. It introduces Converter implementations for various formats, and use template to embed them in Buffer class. This way, branching like if/switch are eliminated from decoding path. Pull Request resolved: https://github.com/pytorch/audio/pull/3170 Reviewed By: xiaohui-zhang Differential Revision: D44048293 Pulled By: mthrok fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
-
- 15 Mar, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Autograd test randomly fails for MFCC transform. Fix it by increasing `nondet_tol` to `1e-10`. Pull Request resolved: https://github.com/pytorch/audio/pull/3169 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D44069673 Pulled By: nateanl fbshipit-source-id: addafefe381104e778b09bfbaafb322df1d9054c
-
- 08 Mar, 2023 2 commits
-
-
moto authored
Summary: This commit adds fields to OutputStream, which shows the result of fitlers, such as width and height after filtering. Before ``` OutputStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray') ``` After ``` OutputVideoStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0) ``` Pull Request resolved: https://github.com/pytorch/audio/pull/3155 Reviewed By: nateanl Differential Revision: D43882399 Pulled By: mthrok fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d -
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3135 Reviewed By: xiaohui-zhang Differential Revision: D43724273 Pulled By: mthrok fbshipit-source-id: 9b52823618948945a26e57d5b3deccbf5f9268c1
-
- 07 Mar, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: `filtfilt` function uses `lfilter`, which calls `conv_1d` operation internally. `conv_1d` is expected to have autograd test failures (see https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html). The PR uses deterministic algorithms in the autograd tests to make `filtfilt` related tests pass. Pull Request resolved: https://github.com/pytorch/audio/pull/3150 Reviewed By: mthrok Differential Revision: D43872977 Pulled By: nateanl fbshipit-source-id: c3d6ec281f34db8a7092526ccb245797bf2338da
-
Zhaoheng Ni authored
Summary: Autograd test randomly failed on gpu linux machine. Increase `nondet_tol` to make it pass. Pull Request resolved: https://github.com/pytorch/audio/pull/3154 Reviewed By: mthrok Differential Revision: D43873028 Pulled By: nateanl fbshipit-source-id: a6668c47967a085e5eafb00e2dd4e61b2b46412e
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3152 In StreamWriter, if the destination is not opened when attempting to write data, it causes segmentation fault. This commit adds guard so that instead of segfault, it will error-out. Reviewed By: nateanl Differential Revision: D43852649 fbshipit-source-id: aef5db7c1508f8a7db5834c2ab6de3cad09f9d60
-
- 02 Mar, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3131 In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable is removed. PTS can be incremented the same way, but the timing was wrong in #3122. This commit fixes it. Reviewed By: xiaohui-zhang Differential Revision: D43712046 fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9
-
- 01 Mar, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: `sox` is not available on Windows machines. Add skip decorators to the sox related tests to skip running tests on Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/3119 Reviewed By: mthrok Differential Revision: D43682754 Pulled By: nateanl fbshipit-source-id: f69987dac8232a3569be83f096b32389bd8bda81
-
- 27 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/3103 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43611794 Pulled By: nateanl fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d
-
- 25 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3099 Reviewed By: mthrok Differential Revision: D43596866 Pulled By: nateanl fbshipit-source-id: 43a139bf8ebdf3261414e2855aefc3b53df298ac
-
- 23 Feb, 2023 1 commit
-
-
mthrok authored
Summary: Remove the Tensor input support from StreamReader Follow up of https://github.com/pytorch/audio/pull/3086 Pull Request resolved: https://github.com/pytorch/audio/pull/3093 Reviewed By: xiaohui-zhang Differential Revision: D43526066 Pulled By: mthrok fbshipit-source-id: 57ba4866c413649173e1c2c3b23ba7de3231b7bc
-
- 22 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3042 Reviewed By: mthrok Differential Revision: D43405932 Pulled By: nateanl fbshipit-source-id: 88f6dabae35565b699230e9909b8f68f4a57f5c7
-
- 17 Feb, 2023 1 commit
-
-
hwangjeff authored
Summary: Makes lengths input optional for `torchaudio.functional.speed`, `torchaudio.transforms.Speed`, and `torchaudio.transforms.SpeedPerturbation`. Pull Request resolved: https://github.com/pytorch/audio/pull/3072 Reviewed By: nateanl, mthrok Differential Revision: D43371406 Pulled By: hwangjeff fbshipit-source-id: ecb38bcc2bfff5c5a396a37eff238b22238e795a
-
- 16 Feb, 2023 1 commit
-
-
hwangjeff authored
Summary: Adds I/O backend dispatcher that routes I/O requests to FFmpeg, SoX, or Soundfile backend, per library availability. It allows users to specify a backend mapped to a media library, i.e. one of `["ffmpeg", "sox", "soundfile"]`, to use via keyword argument, with FFmpeg being the default. Environment variable `TORCHAUDIO_USE_BACKEND_DISPATCHER` gates enablement of the dispatcher; specifically, if `TORCHAUDIO_USE_BACKEND_DISPATCHER` is explicitly set to `1`, importing TorchAudio makes it accessible via `torchaudio.info`, `torchaudio.load`, and `torchaudio.save`. Pull Request resolved: https://github.com/pytorch/audio/pull/3015 Reviewed By: mthrok Differential Revision: D43258649 Pulled By: hwangjeff fbshipit-source-id: 8f12e4e56b9fa3f0814dd3fed3e1783ab23a53a1
-
- 15 Feb, 2023 2 commits
-
-
Cole Li authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3056 Task #2 from https://github.com/pytorch/audio/issues/2835 Reviewed By: mthrok Differential Revision: D42854156 fbshipit-source-id: e1b3bd992c91fedc55f30a814e16efd7c51e0c80
-
hwangjeff authored
Summary: Relaxes input dimension matching constraint on `convolve` to enable broadcasting for inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/3061 Reviewed By: mthrok Differential Revision: D43298078 Pulled By: hwangjeff fbshipit-source-id: a6cc36674754523b88390fac0a05f06562921319
-
- 14 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: replicate of https://github.com/pytorch/audio/issues/2644 Pull Request resolved: https://github.com/pytorch/audio/pull/2880 Reviewed By: mthrok Differential Revision: D41633911 Pulled By: nateanl fbshipit-source-id: 73cf145d75c389e996aafe96571ab86dc21f86e5
-
- 07 Feb, 2023 1 commit
-
-
juan.azcarreta.ortiz authored
Summary: Allows user to play audio through the device speaker. Pull Request resolved: https://github.com/pytorch/audio/pull/3026 Test Plan: Created a new test that mocks a call to the write audio chunk method from StreamWriter. To run the test: `pytest test/torchaudio_unittest/io/_playback_test.py` Reviewed By: mthrok Differential Revision: D43082062 Pulled By: jazcarretao fbshipit-source-id: 01a85b32ce925687a633d1208d15d54556e89dd8
-
- 04 Feb, 2023 1 commit
-
-
Tristan Rice authored
Summary: This adds 2 10 bit pix formats one for CPU and one for CUDA. This allows for training on HDR/10bit video datasets. Pull Request resolved: https://github.com/pytorch/audio/pull/3023 Test Plan: ```py r = StreamReader( reader, format='hevc', ) stream = r.add_video_stream( frames_per_chunk=-1, decoder="hevc_cuvid", hw_accel="cuda", ) frame = next(r.stream()) ``` ```py r = StreamReader( reader, format='hevc', ) stream = r.add_video_stream( frames_per_chunk=-1, filter_desc="format=rgb48le", ) frame = next(r.stream()) ```  Reviewed By: xiaohui-zhang Differential Revision: D43019191 Pulled By: mthrok fbshipit-source-id: fe4359e525b24c8b856dfdf3d2f8596871566350
-
- 03 Feb, 2023 1 commit
-
-
moto authored
Summary: Add GitHub Action-based GPU test jobs. - It seems that there is 2 hour upper cap so only running CUDA/GPU tests. - Since Kaldi related features are not available, they are disabled. Pull Request resolved: https://github.com/pytorch/audio/pull/3029 Reviewed By: hwangjeff Differential Revision: D42983800 Pulled By: mthrok fbshipit-source-id: 47fefe39c635d1c73ad6799ddacefd2666fe5403
-
- 01 Feb, 2023 2 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3027 To support older NumPy, removing `numpy.typing`. Reviewed By: nateanl Differential Revision: D42924428 fbshipit-source-id: af1a370b5baf00c63a088f172dbc2190d414bdf1
-
Wei Wang authored
Summary: https://github.com/pytorch/pytorch/pull/93155 Core has dropped python3.7 Pull Request resolved: https://github.com/pytorch/audio/pull/3020 Reviewed By: mthrok Differential Revision: D42902346 Pulled By: weiwangmeta fbshipit-source-id: 07ab1aff0e128c5960d87e5fa29e341310dea388
-
- 27 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: Moves `AddNoise`, `Convolve`, `FFTConvolve`, `Speed`, `SpeedPerturbation`, `Deemphasis`, and `Preemphasis` out of `torchaudio.prototype.transforms` and into `torchaudio.transforms`. Pull Request resolved: https://github.com/pytorch/audio/pull/3009 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D42730322 Pulled By: hwangjeff fbshipit-source-id: 43739ac31437150d3127e51eddc0f0bba5facb15
-
- 26 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: Passing functions as test parameters causes issues on some platforms. This PR updates the functional tests to pass functions by name instead. Pull Request resolved: https://github.com/pytorch/audio/pull/3011 Reviewed By: mthrok Differential Revision: D42748106 Pulled By: hwangjeff fbshipit-source-id: 4d81dabe4aff2293bc344a457a034a2d9af024e2
-
- 24 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: Moves `add_noise`, `fftconvolve`, `convolve`, `speed`, `preemphasis`, and `deemphasis` out of `torchaudio.prototype.functional` and into `torchaudio.functional`. Pull Request resolved: https://github.com/pytorch/audio/pull/3001 Reviewed By: mthrok Differential Revision: D42688971 Pulled By: hwangjeff fbshipit-source-id: 43280bd3ffeccddae57f1092ac45afb64dd426cc
-
- 22 Jan, 2023 1 commit
-
-
moto authored
Summary: This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well. Example ```python from torchaudio.io import StreamReader s = StreamReader(...) s.add_video_stream(...) for (video_chunk, ) in s.stream(): # video_chunk is Torch tensor type but has extra attribute of PTS print(video_chunk.pts) # reports the PTS of the first frame of the video chunk. ``` For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition of Tensor and metadata, but works like a normal tensor in PyTorch operations. The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83). It was also suggested to attach metadata directly to Tensor object, but the possibility to have the collision on torchaudio's metadata and new attributes introduced in PyTorch cannot be ignored, so we use Tensor subclass implementation. If any unexpected issue arise from metadata attribute name collision, client code can fetch the bare Tensor and continue. Pull Request resolved: https://github.com/pytorch/audio/pull/2975 Reviewed By: hwangjeff Differential Revision: D42526945 Pulled By: mthrok fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35
-
- 19 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: For greater flexibility, this PR makes argument `lengths` optional for `add_noise` and `AddNoise`. Pull Request resolved: https://github.com/pytorch/audio/pull/2977 Reviewed By: nateanl Differential Revision: D42484211 Pulled By: hwangjeff fbshipit-source-id: 54757dcc73df194bb98c1d9d42a2f43f3027b190
-
- 16 Jan, 2023 1 commit
-
-
moto authored
Summary: So that the number of Tensor frames stored in buffers is always a multiple of frames_per_chunk. This makes it easy to store PTS values in aligned manner. Pull Request resolved: https://github.com/pytorch/audio/pull/2984 Reviewed By: nateanl Differential Revision: D42526670 Pulled By: mthrok fbshipit-source-id: d83ee914b7e50de3b51758069b0e0b6b3ebe2e54
-
- 14 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: XLS-R tests are supposed to be skipped on gpu machines, but they are forced to run in [_skipIf](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/common_utils/case_utils.py#L143-L145) decorator. This PR skips the XLS-R tests if the machine is CI and CUDA is available. Pull Request resolved: https://github.com/pytorch/audio/pull/2982 Reviewed By: xiaohui-zhang Differential Revision: D42520292 Pulled By: nateanl fbshipit-source-id: c6ee4d4a801245226c26d9cd13e039e8d910add2
-
- 13 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: XLSR (cross-lingual speech representation) are a set of cross-lingual self-supervised learning models for generating cross-lingual speech representation. It was first proposed in https://arxiv.org/pdf/2006.13979.pdf which is trained on 53 languages (so-called XLSR-53). This PR supports more XLS-R models from https://arxiv.org/pdf/2111.09296.pdf that have more parameters (300M, 1B, 2B) and are trained on 128 languages. Pull Request resolved: https://github.com/pytorch/audio/pull/2959 Reviewed By: mthrok Differential Revision: D42397643 Pulled By: nateanl fbshipit-source-id: 23e8e51a7cde0a226db4f4028db7df8f02b986ce
-
- 12 Jan, 2023 2 commits
-
-
mthrok authored
Summary: * Refactor _extension module so that * the implementation of initialization logic and its execution are separated. * logic goes to `_extension.utils` * the execution is at `_extension.__init__` * global variables are defined and modified in `__init__`. * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED` * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE` * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`. * Merge the sox-related initialization logic in `_extension.utils` module. Pull Request resolved: https://github.com/pytorch/audio/pull/2968 Reviewed By: hwangjeff Differential Revision: D42387251 Pulled By: mthrok fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f -
moto authored
Summary: This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames. Pull Request resolved: https://github.com/pytorch/audio/pull/2969 Reviewed By: xiaohui-zhang Differential Revision: D42403467 Pulled By: mthrok fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992
-
- 10 Jan, 2023 1 commit
-
-
moto authored
Summary: filter graph does not fallback to `best_effort_timestamp`, thus applying filters (like changing fps) on videos without PTS values failed. This commit changes the behavior by overwriting the PTS values with best_effort_timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2970 Reviewed By: YosuaMichael Differential Revision: D42425771 Pulled By: mthrok fbshipit-source-id: 7b7a033ea2ad89bb49d6e1663d35d377dab2aae9
-
- 06 Jan, 2023 2 commits
-
-
moto authored
Summary: This commit adds utility functions that fetch the available/supported formats/devices/codecs. These functions are mostly same with commands like `ffmpeg -decoders`. But the use of `ffmpeg` CLI can report different resutls if there are multiple installation of FFmpegs. Or, the CLI might not be available. Pull Request resolved: https://github.com/pytorch/audio/pull/2958 Reviewed By: hwangjeff Differential Revision: D42371640 Pulled By: mthrok fbshipit-source-id: 96a96183815a126cb1adc97ab7754aef216fff6f
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2963 Phaser batch consistency test takes longer than the rest. Change the sample rate from 44100 to 8000. Reviewed By: hwangjeff Differential Revision: D42379064 fbshipit-source-id: 2005b833c696bb3c2bb1d21c38c39e6163d81d53
-
- 05 Jan, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: The generator part of HiFiGAN model is a vocoder which converts mel spectrogram to waveform. It makes more sense to name it as vocoder for better understanding. Pull Request resolved: https://github.com/pytorch/audio/pull/2955 Reviewed By: carolineechen Differential Revision: D42348864 Pulled By: nateanl fbshipit-source-id: c45a2f8d8d205ee381178ae5d37e9790a257e1aa
-
Grigory Sizov authored
Summary: Closes [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314) ## Description - Add bundle `HIFIGAN_GENERATOR_V3_LJSPEECH` to prototypes. The bundle contains pre-trained HiFiGAN generator weights from the [original HiFiGAN publication](https://github.com/jik876/hifi-gan#pretrained-model), converted slightly to fit our model - Add tests - unit tests checking that vocoder and mel-transform implementations in the bundle give the same results as the original ones. Part of the original HiFiGAN code is ported to this repo to enable these tests - integration test checking that waveform reconstructed from mel spectrogram by the bundle is close enough to the original - Add docs Pull Request resolved: https://github.com/pytorch/audio/pull/2921 Reviewed By: nateanl, mthrok Differential Revision: D42034761 Pulled By: sgrigory fbshipit-source-id: 8b0dadeed510b3c9371d6aa2c46ec7d8378f6048
-