- 01 Feb, 2023 1 commit
-
-
Wei Wang authored
Summary: https://github.com/pytorch/pytorch/pull/93155 Core has dropped python3.7 Pull Request resolved: https://github.com/pytorch/audio/pull/3020 Reviewed By: mthrok Differential Revision: D42902346 Pulled By: weiwangmeta fbshipit-source-id: 07ab1aff0e128c5960d87e5fa29e341310dea388
-
- 31 Jan, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3021 When input format and encode format is different in StreamWriter, filter for format conversion is inserted. A temporary AVFilter (`dst_frame`) is used for this case, but FilterGraph handles the memory allocation, so there is no need to perform allocation by ourselves. This `dst_frame` is otherwise not used, so we do not have to allocate memory at all. This commit removes the unnecessary memory allocation at all. Reviewed By: xiaohui-zhang Differential Revision: D42865042 fbshipit-source-id: 2673b06de1e905dc73a11e2ec1cc6ce7b525d451
-
- 30 Jan, 2023 2 commits
-
-
Yan Li authored
Summary: Currently there will be a few errors when this tutorial is run with a CUDA device. The reasons being: - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU). - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`). Pull Request resolved: https://github.com/pytorch/audio/pull/3017 Reviewed By: mthrok Differential Revision: D42828526 Pulled By: nateanl fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
-
moto authored
Summary: We often need to look at which FFmpeg was found and linked when debugging an issue. Version number is often not enough but there is no easy way to find where the library was found either. This commit adds utility function that prints the build time configuration. It helps to distinguish if the linked FFmpeg is the one from binary distribution built in CI or locally built. Pull Request resolved: https://github.com/pytorch/audio/pull/3014 Reviewed By: hwangjeff Differential Revision: D42794952 Pulled By: mthrok fbshipit-source-id: 91ed358fde8cfe9d6d950f34742b1722e729cf4e
-
- 27 Jan, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3013 Namespace clean up before publishing the torchaudio C++ API as prototype. Reviewed By: hwangjeff Differential Revision: D42699903 fbshipit-source-id: 8a9eed0390dfa4a152124b42f2b927dbdd3e23d2
-
DanilBaibak authored
Summary: Switch to Nova Linux Conda build. Pull Request resolved: https://github.com/pytorch/audio/pull/2899 Reviewed By: seemethere, osalpekar, mthrok Differential Revision: D42416835 Pulled By: DanilBaibak fbshipit-source-id: 70886c4ff6f3243b80059be9385269cc0f2d4764
-
hwangjeff authored
Summary: Moves `AddNoise`, `Convolve`, `FFTConvolve`, `Speed`, `SpeedPerturbation`, `Deemphasis`, and `Preemphasis` out of `torchaudio.prototype.transforms` and into `torchaudio.transforms`. Pull Request resolved: https://github.com/pytorch/audio/pull/3009 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D42730322 Pulled By: hwangjeff fbshipit-source-id: 43739ac31437150d3127e51eddc0f0bba5facb15
-
- 26 Jan, 2023 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3007 Simplify the construction of StreamReader/Writer in C++. Currently these classes require client code to build AVFormatContext manually. This is tedious and not user freindly. Some client code actually uses the same helper function that TorchAudio codebase uses. This commit moves the helper logic inside of the constructor of StreamReader/Writer, so that the signatures of these constructors are easy to use and similar to Python interface. Reviewed By: xiaohui-zhang Differential Revision: D42662520 fbshipit-source-id: d95e5236810c48d7d9bd2d89c05d4f60a44b3ba1
-
hwangjeff authored
Summary: Passing functions as test parameters causes issues on some platforms. This PR updates the functional tests to pass functions by name instead. Pull Request resolved: https://github.com/pytorch/audio/pull/3011 Reviewed By: mthrok Differential Revision: D42748106 Pulled By: hwangjeff fbshipit-source-id: 4d81dabe4aff2293bc344a457a034a2d9af024e2
-
moto authored
Summary: These functions are called part of sox initialization, thus it is no longer needed. Pull Request resolved: https://github.com/pytorch/audio/pull/3010 Reviewed By: hwangjeff Differential Revision: D42744478 Pulled By: mthrok fbshipit-source-id: 17d715b328392397ec47d81a533a307aac22862d
-
- 24 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: Moves `add_noise`, `fftconvolve`, `convolve`, `speed`, `preemphasis`, and `deemphasis` out of `torchaudio.prototype.functional` and into `torchaudio.functional`. Pull Request resolved: https://github.com/pytorch/audio/pull/3001 Reviewed By: mthrok Differential Revision: D42688971 Pulled By: hwangjeff fbshipit-source-id: 43280bd3ffeccddae57f1092ac45afb64dd426cc
-
- 23 Jan, 2023 3 commits
-
-
Nikita Shulga authored
Summary: We don't need the presence of physical HW to compile with CUDA. Likely one of the causes of https://github.com/pytorch/audio/issues/2979 (i.e. in CircleCI builds USE_CUDA were defined by CI environment, so nobody ever checked the default, but this is not the case in Nova builds) Pull Request resolved: https://github.com/pytorch/audio/pull/3005 Test Plan: Check that `compute.cu` is mentioned in builds, for example see https://github.com/pytorch/audio/actions/runs/3990295262/jobs/6843771056#step:9:829 ``` [193/202] /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -DINCLUDE_KALDI -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dlibtorchaudio_EXPORTS -I/__w/audio/audio/pytorch/audio -I/__w/audio/audio/pytorch/audio/third_party/kaldi/src -I/__w/audio/audio/pytorch/audio/third_party/kaldi/submodule/src -isystem=/__w/_temp/conda_environment_3990295262/lib/python3.7/site-packages/torch/include -isystem=/__w/_temp/conda_environment_3990295262/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem=/usr/local/cuda-11.6/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_50,code=compute_50 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -Xcompiler=-fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 -MD -MT torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o -MF torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o.d -x cu -c /__w/audio/audio/pytorch/audio/torchaudio/csrc/rnnt/gpu/compute.cu -o torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o ``` Reviewed By: mthrok Differential Revision: D42687455 Pulled By: malfet fbshipit-source-id: c37ad58cc62439d1268865e9bf0bcb97079a529f
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3002 This commit merges `pop_chunks` and `pop_chunks_with_metadata`. In #2975 (D42526945 (https://github.com/pytorch/audio/commit/0dd59e0dda22eabf54fc95ad8050094df239bd39)), we updated StreamReader so that it returns PTS. In that PR, we introduced `pop_chunks_with_metadata` method, so that the original `pop_chunks` method returns the same type and we could focus on the PTS logic in the code review. The commit is landed, now we merge the two methods, so that the original `pop_chunks` returns Tensor frames and metadata (PTS). Reviewed By: xiaohui-zhang Differential Revision: D42662321 fbshipit-source-id: 37ae088bc63fc516ea068698088925e8b31bc0a1
-
moto authored
Summary: This change fixes the issue where syntax highlighting is broken up par word. ## Plain Before <img width="243" alt="Screenshot 2023-01-20 at 1 28 48 PM" src="https://user-images.githubusercontent.com/855818/213778202-27ec8030-3f2f-4ef9-8210-bce7cfc3cb38.png"> After <img width="244" alt="Screenshot 2023-01-20 at 1 29 01 PM" src="https://user-images.githubusercontent.com/855818/213778231-61c52825-d63a-4913-b10d-a65f3b2cfbbb.png"> ## In articles Before <img width="786" alt="Screenshot 2023-01-20 at 1 34 12 PM" src="https://user-images.githubusercontent.com/855818/213779050-c21ba5e2-84b3-4935-bbab-6edcb7bc89ce.png"> After <img width="783" alt="Screenshot 2023-01-20 at 1 34 17 PM" src="https://user-images.githubusercontent.com/855818/213779069-f1406422-27a4-41cf-8ccd-5058f80860bd.png"> ## In tables Before <img width="813" alt="Screenshot 2023-01-20 at 1 27 35 PM" src="https://user-images.githubusercontent.com/855818/213778039-fede6f18-5a35-47f2-9e0b-a9be5716dc73.png"> After <img width="813" alt="Screenshot 2023-01-20 at 1 27 51 PM" src="https://user-images.githubusercontent.com/855818/213778073-e26275a9-d380-4601-aa92-84af7aeab00f.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/3000 Reviewed By: xiaohui-zhang Differential Revision: D42642522 Pulled By: mthrok fbshipit-source-id: 6831bb90da005aff8d7f178ef768e967bc6d2640
-
- 22 Jan, 2023 1 commit
-
-
moto authored
Summary: This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well. Example ```python from torchaudio.io import StreamReader s = StreamReader(...) s.add_video_stream(...) for (video_chunk, ) in s.stream(): # video_chunk is Torch tensor type but has extra attribute of PTS print(video_chunk.pts) # reports the PTS of the first frame of the video chunk. ``` For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition of Tensor and metadata, but works like a normal tensor in PyTorch operations. The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83). It was also suggested to attach metadata directly to Tensor object, but the possibility to have the collision on torchaudio's metadata and new attributes introduced in PyTorch cannot be ignored, so we use Tensor subclass implementation. If any unexpected issue arise from metadata attribute name collision, client code can fetch the bare Tensor and continue. Pull Request resolved: https://github.com/pytorch/audio/pull/2975 Reviewed By: hwangjeff Differential Revision: D42526945 Pulled By: mthrok fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35
-
- 20 Jan, 2023 4 commits
-
-
moto authored
Summary: Extraction from https://github.com/pytorch/audio/issues/2994 Add docstrings to C++ StreamReader/Writer. Pull Request resolved: https://github.com/pytorch/audio/pull/2997 Reviewed By: nateanl Differential Revision: D42628016 Pulled By: mthrok fbshipit-source-id: b22c43b80997af4a9087142340c67bed28e54917
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2999 Reviewed By: hwangjeff Differential Revision: D42637618 Pulled By: mthrok fbshipit-source-id: 35a7976c316e3b3899ae9c2202f132f1a960b736
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2996 Reviewed By: nateanl Differential Revision: D42624655 Pulled By: mthrok fbshipit-source-id: 8273cbfa529fbc2bd28adc9c63ceb9453838baa4
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2995 Reviewed By: nateanl Differential Revision: D42624676 Pulled By: mthrok fbshipit-source-id: 10fbdaada06ae78e5fa2253eb3331c93c032eeb3
-
- 19 Jan, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model. Pull Request resolved: https://github.com/pytorch/audio/pull/2876 Reviewed By: hwangjeff Differential Revision: D42617414 Pulled By: nateanl fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
-
hwangjeff authored
Summary: In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead. Pull Request resolved: https://github.com/pytorch/audio/pull/2981 Reviewed By: mthrok Differential Revision: D42507228 Pulled By: hwangjeff fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
-
hwangjeff authored
Summary: For greater flexibility, this PR makes argument `lengths` optional for `add_noise` and `AddNoise`. Pull Request resolved: https://github.com/pytorch/audio/pull/2977 Reviewed By: nateanl Differential Revision: D42484211 Pulled By: hwangjeff fbshipit-source-id: 54757dcc73df194bb98c1d9d42a2f43f3027b190
-
- 17 Jan, 2023 2 commits
-
-
Moto Hira authored
Summary: When buffered data are cleared from ChunkedBuffer, the `num_buffered_frames` variable was not updated. This commit fixes that. Reviewed By: xiaohui-zhang Differential Revision: D42538519 fbshipit-source-id: a24a9afcebebd8956d977f05e9c2f0b603d060d1
-
Zhaoheng Ni authored
Summary: The mel spectrograms in the TTS tutorial are upside down. The PR fixes it by using `origin="lower"` in imshow. Pull Request resolved: https://github.com/pytorch/audio/pull/2989 Reviewed By: mthrok Differential Revision: D42538349 Pulled By: nateanl fbshipit-source-id: 4388103a49bdfabf1705c1f979d44ecedd5c910a
-
- 16 Jan, 2023 4 commits
-
-
moto authored
Summary: Split `convert_video` into memory allocation function and write function. Also put all the buffer implementations into detail namespace. Pull Request resolved: https://github.com/pytorch/audio/pull/2988 Reviewed By: xiaohui-zhang Differential Revision: D42536769 Pulled By: mthrok fbshipit-source-id: 36fbf437d4bfd521322846161ae08a48c782c540
-
Robin Scheibler authored
Summary: The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following. 1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset 2. Corrects `args.data_dir` to `args.root_dir` in eval.py 3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`) Pull Request resolved: https://github.com/pytorch/audio/pull/2987 Reviewed By: xiaohui-zhang Differential Revision: D42536992 Pulled By: nateanl fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
-
moto authored
Summary: So that the number of Tensor frames stored in buffers is always a multiple of frames_per_chunk. This makes it easy to store PTS values in aligned manner. Pull Request resolved: https://github.com/pytorch/audio/pull/2984 Reviewed By: nateanl Differential Revision: D42526670 Pulled By: mthrok fbshipit-source-id: d83ee914b7e50de3b51758069b0e0b6b3ebe2e54
-
moto authored
Summary: FilterGraph supports multi threading, and by default, the number of threads is determined automatically. Rather than an automatic behavior, which is unpredictable, it is better to fix the number of threads to 1. Follow-up: Add an interface to adjust it. Similar to https://github.com/pytorch/audio/pull/2949. Pull Request resolved: https://github.com/pytorch/audio/pull/2985 Reviewed By: nateanl Differential Revision: D42526958 Pulled By: mthrok fbshipit-source-id: c4f7f95317e93a39378107636a3ca30f6ddfe466
-
- 15 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models: - WAV2VEC2_XLSR_300M - WAV2VEC2_XLSR_1B - WAV2VEC2_XLSR_2B All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`. Pull Request resolved: https://github.com/pytorch/audio/pull/2978 Reviewed By: hwangjeff Differential Revision: D42501491 Pulled By: nateanl fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3
-
- 14 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: XLS-R tests are supposed to be skipped on gpu machines, but they are forced to run in [_skipIf](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/common_utils/case_utils.py#L143-L145) decorator. This PR skips the XLS-R tests if the machine is CI and CUDA is available. Pull Request resolved: https://github.com/pytorch/audio/pull/2982 Reviewed By: xiaohui-zhang Differential Revision: D42520292 Pulled By: nateanl fbshipit-source-id: c6ee4d4a801245226c26d9cd13e039e8d910add2
-
- 13 Jan, 2023 2 commits
-
-
moto authored
Summary: Per the suggestion by nateanl, adding the visualization of feature fed to ASR. <img width="688" alt="Screen Shot 2023-01-12 at 8 19 59 PM" src="https://user-images.githubusercontent.com/855818/212215190-23be7553-4c04-40d9-944e-3ee2ff69c49b.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2974 Reviewed By: nateanl Differential Revision: D42484088 Pulled By: mthrok fbshipit-source-id: 2c839492869416554eac04aa06cd12078db21bd7
-
Zhaoheng Ni authored
Summary: XLSR (cross-lingual speech representation) are a set of cross-lingual self-supervised learning models for generating cross-lingual speech representation. It was first proposed in https://arxiv.org/pdf/2006.13979.pdf which is trained on 53 languages (so-called XLSR-53). This PR supports more XLS-R models from https://arxiv.org/pdf/2111.09296.pdf that have more parameters (300M, 1B, 2B) and are trained on 128 languages. Pull Request resolved: https://github.com/pytorch/audio/pull/2959 Reviewed By: mthrok Differential Revision: D42397643 Pulled By: nateanl fbshipit-source-id: 23e8e51a7cde0a226db4f4028db7df8f02b986ce
-
- 12 Jan, 2023 4 commits
-
-
mthrok authored
Summary: * Refactor _extension module so that * the implementation of initialization logic and its execution are separated. * logic goes to `_extension.utils` * the execution is at `_extension.__init__` * global variables are defined and modified in `__init__`. * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED` * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE` * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`. * Merge the sox-related initialization logic in `_extension.utils` module. Pull Request resolved: https://github.com/pytorch/audio/pull/2968 Reviewed By: hwangjeff Differential Revision: D42387251 Pulled By: mthrok fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f -
moto authored
Summary: This commit add methods to query output configuration from FilterGraph object. * time_base -> required to compute PTS of output frame * sample_rate, num_channels -> required to compute PTS and pre allocate buffers for audio. Pull Request resolved: https://github.com/pytorch/audio/pull/2976 Reviewed By: xiaohui-zhang Differential Revision: D42466744 Pulled By: mthrok fbshipit-source-id: dd27109819bfb1fbe37b8233dd6a5e4224fe3f6c
-
moto authored
Summary: This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames. Pull Request resolved: https://github.com/pytorch/audio/pull/2969 Reviewed By: xiaohui-zhang Differential Revision: D42403467 Pulled By: mthrok fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992
-
moto authored
Summary: Following the change in PyTorch core. https://github.com/pytorch/pytorch/commit/87e4a087784c805312a2b48bb063d2400df26c5e Pull Request resolved: https://github.com/pytorch/audio/pull/2973 Reviewed By: xiaohui-zhang Differential Revision: D42462709 Pulled By: mthrok fbshipit-source-id: 60c2aa3d63fe25d8e0b7aa476404e7a55d6eb87f
-
- 11 Jan, 2023 1 commit
-
-
pbialecki authored
Summary: CC atalman Pull Request resolved: https://github.com/pytorch/audio/pull/2951 Reviewed By: mthrok Differential Revision: D42459205 Pulled By: atalman fbshipit-source-id: b2d7c5604ba1f3bb4d9a45a052ac41054acd52dd
-
- 10 Jan, 2023 2 commits
-
-
moto authored
Summary: filter graph does not fallback to `best_effort_timestamp`, thus applying filters (like changing fps) on videos without PTS values failed. This commit changes the behavior by overwriting the PTS values with best_effort_timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2970 Reviewed By: YosuaMichael Differential Revision: D42425771 Pulled By: mthrok fbshipit-source-id: 7b7a033ea2ad89bb49d6e1663d35d377dab2aae9
-
moto authored
Summary: * Add missing docsrtings * Add default values Pull Request resolved: https://github.com/pytorch/audio/pull/2971 Reviewed By: xiaohui-zhang Differential Revision: D42425796 Pulled By: mthrok fbshipit-source-id: a6a946875142a54424c059bbfbab1908a1564bd3
-
- 06 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: `InverseMelScale` is missing from the nightly documentation webpage. `MelScale` is better in Feature Extractions section. This PR moves both documents into Feature Extractions section. Pull Request resolved: https://github.com/pytorch/audio/pull/2967 Reviewed By: mthrok Differential Revision: D42387886 Pulled By: nateanl fbshipit-source-id: cdac020887817ea2530bfb26e8ed414ae4761420
-