- 18 Sep, 2024 1 commit
-
-
mayp777 authored
-
- 06 Sep, 2024 1 commit
-
-
mayp777 authored
-
- 03 Sep, 2024 1 commit
-
-
mayp777 authored
-
- 02 Sep, 2024 1 commit
-
-
mayp777 authored
-
- 16 Oct, 2023 3 commits
-
-
flyingdown authored
-
flyingdown authored
修改默认编译器为hipcc See merge request dcutoolkit/deeplearing/torchaudio!1
-
flyingdown authored
-
- 25 Aug, 2023 1 commit
-
-
flyingdown authored
-
- 14 Jun, 2023 1 commit
-
-
flyingdown authored
2.添加了dcu_version和相关dtk信息
-
- 08 May, 2023 1 commit
-
-
flyingdown authored
-
- 05 May, 2023 1 commit
-
-
flyingdown authored
-
- 09 Dec, 2022 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2911 Reviewed By: carolineechen Differential Revision: D41887854 Pulled By: mthrok fbshipit-source-id: eb91773ec67b4cda2d70733df450956d83742509
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2906 The correct way to create AVFormatContext* for output is to pass an address of an uninitialized *AVFormatContext struct to `avformat_alloc_output_context2` function. The current code pre-allocates AVFormatContext* with `avformat_alloc_context`, then this allocated object is lost inside of `avformat_alloc_output_context2`. Reviewed By: xiaohui-zhang Differential Revision: D41865685 fbshipit-source-id: 9a9dc83b5acfe9b450f191fe716c85ebb5a5d842
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2905 In StreamWriter, if the tensor format is different from the encoding format, then a FilterGraph object is automatically inserted to convert the format. The FilterGraph object operates on AVFrames. The input AVFrame must be allocated by us, but the output AVFrames is filled by FilterGraph, thus no need to allocate it. Now the output AVFrame is used as input to encoder regardless of whether FilterGraph was inserted. Thus the output AVFrame has to be manually allocated by us when FilterGraph is not used. The current code flips this condition and incorrectly allocates AVFrame when FilterGraph is present and does not allocate otherwise. This commit fix that. Reviewed By: xiaohui-zhang Differential Revision: D41866198 fbshipit-source-id: 40799c147dc8166a979ecfb58ed8e502539a6aed
-
Andrey Talman authored
-
- 04 Dec, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2885 In `_init_hubert_pretrain_model ` method which initialize the hubert pretrain models, `kaiming_normal_` should be applied on `ConvLayerBlock` instead of `LayerNorm` layer. This PR fixes it and adds more unit tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2886 Reviewed By: hwangjeff Differential Revision: D41713801 Pulled By: nateanl fbshipit-source-id: ed199baf7504d06bbf2d31c522ae708a75426a2d
-
- 18 Nov, 2022 4 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2865 Reviewed By: carolineechen Differential Revision: D41403756 Pulled By: mthrok fbshipit-source-id: d193caa90e786f08f28e4cc2df4b4fb77aa8f592
-
Eli Uriegas authored
Summary: Makes it specific to which version of otool and install_name_tool we actually prefer since using the one from conda can produce inconsistent results Fixes https://github.com/pytorch/audio/issues/2806 Signed-off-by:
Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/audio/pull/2828 Reviewed By: malfet, mthrok Differential Revision: D40960633 Pulled By: seemethere fbshipit-source-id: 5010c06578f1efc4fe314f9a3ff47f18e14ad156
-
moto authored
Summary: StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption. This commit fixes it by properly computing time_base from frame rate. Address https://github.com/pytorch/audio/issues/2830 Pull Request resolved: https://github.com/pytorch/audio/pull/2831 Reviewed By: carolineechen Differential Revision: D41036084 Pulled By: mthrok fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609
-
moto authored
Summary: Addresses https://github.com/pytorch/audio/issues/2790. Previously AVPacket objects had duration==0. `av_interleaved_write_frame` function was inferring the duration of packets by comparing them against the next ones but It could not infer the duration of the last packet, as there is no subsequent frame, thus was omitting it from the final data. This commit fixes it by explicitly setting packet duration = 1 (one frame) only for video. (audio AVPacket contains multiple samples, so it's different. To ensure the correctness for audio, the tests were added.) Pull Request resolved: https://github.com/pytorch/audio/pull/2789 Reviewed By: xiaohui-zhang Differential Revision: D40627439 Pulled By: mthrok fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320
-
- 16 Nov, 2022 4 commits
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2847 In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training. Pull Request resolved: https://github.com/pytorch/audio/pull/2854 Reviewed By: carolineechen Differential Revision: D41343486 Pulled By: nateanl fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2845 Pull Request resolved: https://github.com/pytorch/audio/pull/2846 Reviewed By: carolineechen Differential Revision: D41251624 Pulled By: nateanl fbshipit-source-id: 1a363d2314d6a452f35c109b9730da64ada5a2fd
-
hwangjeff authored
-
- 15 Nov, 2022 1 commit
-
-
moto authored
Summary: * Add the new official torchaudio logo to documentation/README. * Add a page for download logo. https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/index.html <img width="1068" alt="Screen Shot 2022-11-14 at 10 30 27 AM" src="https://user-images.githubusercontent.com/855818/201738349-9e248f15-dce2-4931-9066-aa898a53d6ad.png"> https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/logo.html <img width="617" alt="Screen Shot 2022-11-14 at 10 30 47 AM" src="https://user-images.githubusercontent.com/855818/201738420-ad0fda2f-f310-4802-851c-bbdf6c84c045.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2802 Reviewed By: carolineechen Differential Revision: D41295277 Pulled By: mthrok fbshipit-source-id: 6615d00799c9611f875e8485459d800e350b3486
-
- 03 Nov, 2022 2 commits
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2825 Reviewed By: carolineechen Differential Revision: D40954522 Pulled By: mthrok fbshipit-source-id: 433fb856a74a340af4d49e5c65a6270f0b00c835
-
moto authored
Summary: PyTorch logo is included in pytorch doc theme, (and cannot be changed without custom CSS) so no need to have them here. Pull Request resolved: https://github.com/pytorch/audio/pull/2824 Reviewed By: carolineechen Differential Revision: D40954564 Pulled By: mthrok fbshipit-source-id: 5e9a91fddcc92c141baf1996f721c09c037fb003
-
- 02 Nov, 2022 1 commit
-
-
moto authored
Summary: <img width="756" alt="Screen Shot 2022-11-01 at 3 32 58 PM" src="https://user-images.githubusercontent.com/855818/199173348-f463ae71-438c-4dad-a481-b65522a8e52f.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2812 Reviewed By: carolineechen Differential Revision: D40919942 Pulled By: mthrok fbshipit-source-id: 18e5a709c262fb0b15ada0d303f1d0dee033beb1
-
- 29 Oct, 2022 1 commit
-
-
moto authored
-
- 20 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2780 Pull Request resolved: https://github.com/pytorch/audio/pull/2781 Reviewed By: carolineechen, mthrok Differential Revision: D40556794 Pulled By: nateanl fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e
-
- 19 Oct, 2022 4 commits
-
-
Caroline Chen authored
Summary: add ability to load only improvised or only scripted utterances. Pull Request resolved: https://github.com/pytorch/audio/pull/2778 Reviewed By: nateanl Differential Revision: D40511865 Pulled By: carolineechen fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539
-
Caroline Chen authored
Summary: previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data. Pull Request resolved: https://github.com/pytorch/audio/pull/2777 Reviewed By: nateanl Differential Revision: D40480605 Pulled By: carolineechen fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103
-
Zhaoheng Ni authored
Summary: The file structure of VoxCeleb1 is as follows: ``` root/ └── wav/ └── speaker_id folders ``` Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders. This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users. Pull Request resolved: https://github.com/pytorch/audio/pull/2776 Reviewed By: carolineechen Differential Revision: D40483707 Pulled By: nateanl fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d -
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775 Reviewed By: carolineechen Differential Revision: D40481144 Pulled By: nateanl fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e
-
- 18 Oct, 2022 1 commit
-
-
nateanl authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774 Reviewed By: carolineechen Differential Revision: D40445274 Pulled By: nateanl fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d
-
- 17 Oct, 2022 1 commit
-
-
moto authored
Summary: * Refactor benchmark script * Rename `time` variable to avoid (potential) conflicting with time module * Fix `beta` parameter in benchmark (it was not used previously) * Use `timeit` module for benchmark * Add plot * Move the comment on result at the end * Add link to an explanation of aliasing https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2773 Reviewed By: carolineechen Differential Revision: D40421337 Pulled By: mthrok fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a
-
- 14 Oct, 2022 2 commits
-
-
moto authored
Summary: In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command. It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html <img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png"> This commit fixes it by closing the figure. Pull Request resolved: https://github.com/pytorch/audio/pull/2771 Reviewed By: nateanl Differential Revision: D40382076 Pulled By: mthrok fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a
-
nateanl authored
Summary: The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:  In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded. Pull Request resolved: https://github.com/pytorch/audio/pull/2769 Reviewed By: carolineechen Differential Revision: D40358382 Pulled By: nateanl fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e
-
- 13 Oct, 2022 2 commits
-
-
moto authored
Summary: * Document `__call__` instead of `__init__` * List CTCHypothesis first as it is used in combination with CTCDecoder * Fix indentation of score method docstring Pull Request resolved: https://github.com/pytorch/audio/pull/2766 Reviewed By: carolineechen Differential Revision: D40349388 Pulled By: mthrok fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c
-
Nikita Shulga authored
Summary: `publishe`->`published` Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published` Pull Request resolved: https://github.com/pytorch/audio/pull/2761 Reviewed By: carolineechen Differential Revision: D40313042 Pulled By: malfet fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
-