Commits · 2645136464480f10a44f18501c7b7960e3dec742 · OpenDAS / Torchaudio

18 Sep, 2024 1 commit
- [UPDATE] Update README · 26451364
  mayp777 authored Sep 18, 2024
  
  26451364
06 Sep, 2024 1 commit
- [UPDATE]Update README · 3d46bdd9
  mayp777 authored Sep 06, 2024
  
  3d46bdd9
03 Sep, 2024 1 commit
- [UPDATE]Update README · 6a5a84f9
  mayp777 authored Sep 03, 2024
  
  6a5a84f9
02 Sep, 2024 1 commit
- UPDATE · ffeba11a
  mayp777 authored Sep 02, 2024
  
  ffeba11a
16 Oct, 2023 3 commits
- Update CMakeLists.txt · 29deb085
  flyingdown authored Oct 16, 2023
  
  29deb085
- Merge branch '0.13.1-dev' into '0.13.1-dtk-23.10' · 5305edef
  flyingdown authored Oct 16, 2023
```
修改默认编译器为hipcc

See merge request dcutoolkit/deeplearing/torchaudio!1
```
  5305edef
- 修改默认编译器为hipcc · e4abf160
  flyingdown authored Oct 16, 2023
  
  e4abf160
25 Aug, 2023 1 commit
- enable function cuda_version/is_ffmpeg_available/is_kaldi_available/is_sox_available · c1ce24db
  flyingdown authored Aug 25, 2023
  
  c1ce24db
14 Jun, 2023 1 commit
- 1.修改了README.md · 8bfbf473
  flyingdown authored Jun 13, 2023
```
2.添加了dcu_version和相关dtk信息
```
  8bfbf473
08 May, 2023 1 commit
- add README_HIP.md · 5e5c099a
  flyingdown authored May 08, 2023
  
  5e5c099a
05 May, 2023 1 commit
- rnnt · d946a7c8
  flyingdown authored May 05, 2023
  
  d946a7c8
09 Dec, 2022 4 commits

Update author and maintainer info (#2911) · b90d7988

moto authored Dec 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2911

Reviewed By: carolineechen

Differential Revision: D41887854

Pulled By: mthrok

fbshipit-source-id: eb91773ec67b4cda2d70733df450956d83742509

b90d7988

Fix duplicated memory allocation in StreamWriter (#2906) · 4adbd54a

Moto Hira authored Dec 09, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2906

The correct way to create AVFormatContext* for output is to pass an address of an uninitialized *AVFormatContext struct to `avformat_alloc_output_context2` function.

The current code pre-allocates AVFormatContext* with `avformat_alloc_context`, then this allocated object is lost inside of `avformat_alloc_output_context2`.

Reviewed By: xiaohui-zhang

Differential Revision: D41865685

fbshipit-source-id: 9a9dc83b5acfe9b450f191fe716c85ebb5a5d842

4adbd54a

Fix wrong frame allocation in StreamWriter (#2905) · 30a1070c

Moto Hira authored Dec 09, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2905

In StreamWriter, if the tensor format is different from the encoding format, then a FilterGraph object is automatically inserted to convert the format.

The FilterGraph object operates on AVFrames. The input AVFrame must be allocated by us, but the output AVFrames is filled by FilterGraph, thus no need to allocate it.

Now the output AVFrame is used as input to encoder regardless of whether FilterGraph was inserted. Thus the output AVFrame has to be manually allocated by us when FilterGraph is not used.

The current code flips this condition and incorrectly allocates AVFrame when FilterGraph is present and does not allocate otherwise.

This commit fix that.

Reviewed By: xiaohui-zhang

Differential Revision: D41866198

fbshipit-source-id: 40799c147dc8166a979ecfb58ed8e502539a6aed

30a1070c

[Rlease only change] Advance version for nightly (#2903) · fbf968c0
Andrey Talman authored Dec 08, 2022

fbf968c0

04 Dec, 2022 1 commit

Fix _init_hubert_pretrain_model (#2886) · 6b13e266

Zhaoheng Ni authored Dec 03, 2022

Summary:
address https://github.com/pytorch/audio/issues/2885

In `_init_hubert_pretrain_model ` method which initialize the hubert pretrain models, `kaiming_normal_` should be applied on `ConvLayerBlock` instead of `LayerNorm` layer. This PR fixes it and adds more unit tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2886

Reviewed By: hwangjeff

Differential Revision: D41713801

Pulled By: nateanl

fbshipit-source-id: ed199baf7504d06bbf2d31c522ae708a75426a2d

6b13e266

18 Nov, 2022 4 commits

Update decoder doc (#2865) · 1533b2cf

moto authored Nov 18, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2865

Reviewed By: carolineechen

Differential Revision: D41403756

Pulled By: mthrok

fbshipit-source-id: d193caa90e786f08f28e4cc2df4b4fb77aa8f592

1533b2cf

packaging: Specify otool / install_name_tool (#2828) · 235add98

Eli Uriegas authored Nov 02, 2022

Summary:
Makes it specific to which version of otool and install_name_tool we actually prefer since using the one from conda can produce inconsistent results

Fixes https://github.com/pytorch/audio/issues/2806

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>

Pull Request resolved: https://github.com/pytorch/audio/pull/2828

Reviewed By: malfet, mthrok

Differential Revision: D40960633

Pulled By: seemethere

fbshipit-source-id: 5010c06578f1efc4fe314f9a3ff47f18e14ad156

235add98

Fix decimal FPS handling StreamWriter (#2831) · 0c1e6f22

moto authored Nov 04, 2022

Summary:
StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption.

This commit fixes it by properly computing time_base from frame rate.

Address https://github.com/pytorch/audio/issues/2830

Pull Request resolved: https://github.com/pytorch/audio/pull/2831

Reviewed By: carolineechen

Differential Revision: D41036084

Pulled By: mthrok

fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609

0c1e6f22

Fix issue with the missing video frame in StreamWriter (#2789) · 199a6ee2

moto authored Oct 24, 2022

Summary:
Addresses https://github.com/pytorch/audio/issues/2790.

Previously AVPacket objects had duration==0.

`av_interleaved_write_frame` function was inferring the duration of packets by
comparing them against the next ones but It could not infer the duration of
the last packet, as there is no subsequent frame, thus was omitting it from the final data.

This commit fixes it by explicitly setting packet duration = 1 (one frame)
only for video. (audio AVPacket contains multiple samples, so it's different.
To ensure the correctness for audio, the tests were added.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2789

Reviewed By: xiaohui-zhang

Differential Revision: D40627439

Pulled By: mthrok

fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320

199a6ee2

16 Nov, 2022 4 commits

Enable mixed precision training for hubert_pretrain_model (#2854) · 030646c0

Zhaoheng Ni authored Nov 16, 2022

Summary:
address https://github.com/pytorch/audio/issues/2847

In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training.

Pull Request resolved: https://github.com/pytorch/audio/pull/2854

Reviewed By: carolineechen

Differential Revision: D41343486

Pulled By: nateanl

fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705

030646c0

Fix hubert fine-tuning recipe (#2851) · 980528e9

Zhaoheng Ni authored Nov 16, 2022

Summary:
- `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths.
- model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`.
- The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module.

cc simpleoier

Pull Request resolved: https://github.com/pytorch/audio/pull/2851

Reviewed By: carolineechen

Differential Revision: D41327998

Pulled By: nateanl

fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d

980528e9

Fix initialization in hubert_pretrain_model (#2846) · 2574e114

Zhaoheng Ni authored Nov 13, 2022

Summary:
address https://github.com/pytorch/audio/issues/2845

Pull Request resolved: https://github.com/pytorch/audio/pull/2846

Reviewed By: carolineechen

Differential Revision: D41251624

Pulled By: nateanl

fbshipit-source-id: 1a363d2314d6a452f35c109b9730da64ada5a2fd

2574e114

Make buffer size configurable in ffmpeg file object operations and set size in backend · 71ddee16
hwangjeff authored Oct 31, 2022

71ddee16

15 Nov, 2022 1 commit

Add logo (#2802) · 89e28623

moto authored Nov 14, 2022

Summary:
* Add the new official torchaudio logo to documentation/README.
* Add a page for download logo.

https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/index.html

<img width="1068" alt="Screen Shot 2022-11-14 at 10 30 27 AM" src="https://user-images.githubusercontent.com/855818/201738349-9e248f15-dce2-4931-9066-aa898a53d6ad.png">

https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/logo.html

<img width="617" alt="Screen Shot 2022-11-14 at 10 30 47 AM" src="https://user-images.githubusercontent.com/855818/201738420-ad0fda2f-f310-4802-851c-bbdf6c84c045.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2802

Reviewed By: carolineechen

Differential Revision: D41295277

Pulled By: mthrok

fbshipit-source-id: 6615d00799c9611f875e8485459d800e350b3486

89e28623

03 Nov, 2022 2 commits

Update favicon (#2825) · 613cb669

moto authored Nov 02, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2825

Reviewed By: carolineechen

Differential Revision: D40954522

Pulled By: mthrok

fbshipit-source-id: 433fb856a74a340af4d49e5c65a6270f0b00c835

613cb669

Remove redundant PyTorch logo assets (#2824) · 8125372b

moto authored Nov 02, 2022

Summary:
PyTorch logo is included in pytorch doc theme, (and cannot be changed without custom CSS) so no need to have them here.

Pull Request resolved: https://github.com/pytorch/audio/pull/2824

Reviewed By: carolineechen

Differential Revision: D40954564

Pulled By: mthrok

fbshipit-source-id: 5e9a91fddcc92c141baf1996f721c09c037fb003

8125372b

02 Nov, 2022 1 commit

Add links to training recipes (#2812) · 34f8a4b9

moto authored Nov 01, 2022

Summary:
<img width="756" alt="Screen Shot 2022-11-01 at 3 32 58 PM" src="https://user-images.githubusercontent.com/855818/199173348-f463ae71-438c-4dad-a481-b65522a8e52f.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2812

Reviewed By: carolineechen

Differential Revision: D40919942

Pulled By: mthrok

fbshipit-source-id: 18e5a709c262fb0b15ada0d303f1d0dee033beb1

34f8a4b9

29 Oct, 2022 1 commit
- Cherry-pick #2767 to release/0.13 (#2805) · d416a49b
  moto authored Oct 29, 2022
  
  d416a49b
20 Oct, 2022 1 commit

Fix doc in torchaudio.backend (#2781) · bc8640b4

Zhaoheng Ni authored Oct 20, 2022

Summary:
address https://github.com/pytorch/audio/issues/2780

Pull Request resolved: https://github.com/pytorch/audio/pull/2781

Reviewed By: carolineechen, mthrok

Differential Revision: D40556794

Pulled By: nateanl

fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e

bc8640b4

19 Oct, 2022 4 commits

Add iemocap variants (#2778) · 1b444d87

Caroline Chen authored Oct 19, 2022

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

1b444d87

Update download path for speechcommands (#2777) · ee68a982

Caroline Chen authored Oct 18, 2022

Summary:
previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2777

Reviewed By: nateanl

Differential Revision: D40480605

Pulled By: carolineechen

fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103

ee68a982

Add notes on file structure in Voxceleb1 based datasets (#2776) · 3b1d85d2

Zhaoheng Ni authored Oct 18, 2022

Summary:
The file structure of VoxCeleb1 is as follows:
```
root/
└── wav/
    └── speaker_id folders
```
Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders.

This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users.

Pull Request resolved: https://github.com/pytorch/audio/pull/2776

Reviewed By: carolineechen

Differential Revision: D40483707

Pulled By: nateanl

fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d

3b1d85d2

Add file_name to the returned item in Snips dataset (#2775) · 9a013fdb

Zhaoheng Ni authored Oct 18, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

9a013fdb

18 Oct, 2022 1 commit

Update description of HDemucs pipelines (#2774) · 88a8dd4d

nateanl authored Oct 18, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774

Reviewed By: carolineechen

Differential Revision: D40445274

Pulled By: nateanl

fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d

88a8dd4d

17 Oct, 2022 1 commit

Update resampling tutorial (#2773) · b703a631

moto authored Oct 17, 2022

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

b703a631

14 Oct, 2022 2 commits

Fix leaking matplotlib figure (#2771) · fc6090e9

moto authored Oct 14, 2022

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

fc6090e9

Fix fading in hybrid demucs tutorial (#2769) · 55c695bb

nateanl authored Oct 13, 2022

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

55c695bb

13 Oct, 2022 2 commits

Fix CTCDecoder doc (#2766) · bd37611b

moto authored Oct 13, 2022

Summary:
* Document `__call__` instead of `__init__`
* List CTCHypothesis first as it is used in combination with CTCDecoder
* Fix indentation of score method docstring

Pull Request resolved: https://github.com/pytorch/audio/pull/2766

Reviewed By: carolineechen

Differential Revision: D40349388

Pulled By: mthrok

fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c

bd37611b

Fix typos in tacotron2 tutorial (#2761) · 2c3448a9

Nikita Shulga authored Oct 12, 2022

Summary:
`publishe`->`published`

Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`

Pull Request resolved: https://github.com/pytorch/audio/pull/2761

Reviewed By: carolineechen

Differential Revision: D40313042

Pulled By: malfet

fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b

2c3448a9