Commits · 91db978bb14ab3ef818c0b75036465fdb3a2fa0b · OpenDAS / Torchaudio

07 Jun, 2023 1 commit

Make dlopen ffmpeg default off (#3418) · 91db978b

moto authored Jun 07, 2023

Summary:
To investigate https://github.com/pytorch/audio/issues/3411

Pull Request resolved: https://github.com/pytorch/audio/pull/3418

Differential Revision: D46535891

Pulled By: mthrok

fbshipit-source-id: b90bba399eb54f9f0ae073bd590cd8a46054ed7e

91db978b

05 Jun, 2023 1 commit

Clean-up ComputeKaldiPitch residue (#3403) · c076d1a8

moto authored Jun 05, 2023

Summary:
Follow up of: https://github.com/pytorch/audio/pull/3368

Remove files and lines no longer used.

Pull Request resolved: https://github.com/pytorch/audio/pull/3403

Differential Revision: D46441462

Pulled By: mthrok

fbshipit-source-id: 11b881ec4b24fa0d625c6aee9f4bd91f637f9923

c076d1a8

03 Jun, 2023 1 commit

[audio][PR] Add option to dlopen FFmpeg libraries (#3402) · b7d3e89a

Moto Hira authored Jun 02, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3402

This is a second attempt of https://github.com/pytorch/audio/pull/3353.

The basic logic to enable dlopen for FFmpeg libraries are same.
It uses `at::DynamicLibrary`, which allows to compile torchaudio without
linking FFmpeg libraries.

This time, the option to enable this feature DLOPEN_FFMPEG has been added,
so that users have a way to disable this feature and keep using build-time
linking.

Please refer to stub.h for more technical detail.

Differential Revision: D46403783

fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16

b7d3e89a

20 May, 2023 1 commit

[audio][PR] Add forced_align function to torchaudio (#3348) · e7935cff

Zhaoheng Ni authored May 19, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3348

The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame.

Reviewed By: vineelpratap, xiaohui-zhang

Differential Revision: D45867265

fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb

e7935cff

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

05 Apr, 2023 1 commit

Remove source for flashlight-text bundle (#3236) · 5053aa7f

moto authored Apr 05, 2023

Summary:
Following https://github.com/pytorch/audio/pull/3232, static build of flashlight-text has been disabled and removed from nightly build.

This commit removes the related source/build from torchaudio code base.

Pull Request resolved: https://github.com/pytorch/audio/pull/3236

Reviewed By: jacobkahn

Differential Revision: D44712539

Pulled By: mthrok

fbshipit-source-id: a201c89b5046f224526309cd4e17a5105e58a949

5053aa7f

04 Apr, 2023 1 commit

Disable CTC decoder bundle by default (#3232) · 3844a2bd

moto authored Apr 04, 2023

Summary:
As we migrate to use upstream flashlight-text and KenLM, this PR disable building CTC decoder by default.
This will stop shipping flashlight-text and KenLM bundle in torchaudio binary.

Ref: https://github.com/pytorch/audio/issues/3088

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/3232

Reviewed By: hwangjeff

Differential Revision: D44650872

Pulled By: mthrok

fbshipit-source-id: 2415623abaf3cafa181135db5112d3c711137cd7

3844a2bd

14 Feb, 2023 1 commit

Add simulate_rir_ism method for room impulse response simulation (#2880) · 8c5c9a9b

Zhaoheng Ni authored Feb 14, 2023

Summary:
replicate of https://github.com/pytorch/audio/issues/2644

Pull Request resolved: https://github.com/pytorch/audio/pull/2880

Reviewed By: mthrok

Differential Revision: D41633911

Pulled By: nateanl

fbshipit-source-id: 73cf145d75c389e996aafe96571ab86dc21f86e5

8c5c9a9b

09 Feb, 2023 1 commit

Updated USE_ROCM detection (#3008) · 05d597fa

DanilBaibak authored Feb 09, 2023

Summary:
We don't need the presence of physical HW to compile with CUDA.

This is a follow up PR regarding `USE_ROCM` for issue https://github.com/pytorch/audio/issues/2979.

Pull Request resolved: https://github.com/pytorch/audio/pull/3008

Reviewed By: malfet

Differential Revision: D42708862

Pulled By: DanilBaibak

fbshipit-source-id: 90cedc80a2d180ca1e0912ad5b644398182417b8

05d597fa

23 Jan, 2023 1 commit

Tweak `USE_CUDA` detection (#3005) · 09e7d818

Nikita Shulga authored Jan 23, 2023

Summary:
We don't need the presence of physical HW to compile with CUDA.

Likely one of the causes of  https://github.com/pytorch/audio/issues/2979 (i.e. in CircleCI builds USE_CUDA were defined by CI environment, so nobody ever checked the default, but this is not the case in Nova builds)

Pull Request resolved: https://github.com/pytorch/audio/pull/3005

Test Plan:
Check that `compute.cu` is mentioned in builds, for example see https://github.com/pytorch/audio/actions/runs/3990295262/jobs/6843771056#step:9:829
```
[193/202] /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -DINCLUDE_KALDI -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dlibtorchaudio_EXPORTS -I/__w/audio/audio/pytorch/audio -I/__w/audio/audio/pytorch/audio/third_party/kaldi/src -I/__w/audio/audio/pytorch/audio/third_party/kaldi/submodule/src -isystem=/__w/_temp/conda_environment_3990295262/lib/python3.7/site-packages/torch/include -isystem=/__w/_temp/conda_environment_3990295262/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem=/usr/local/cuda-11.6/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_50,code=compute_50 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -Xcompiler=-fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 -MD -MT torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o -MF torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o.d -x cu -c /__w/audio/audio/pytorch/audio/torchaudio/csrc/rnnt/gpu/compute.cu -o torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o
```

Reviewed By: mthrok

Differential Revision: D42687455

Pulled By: malfet

fbshipit-source-id: c37ad58cc62439d1268865e9bf0bcb97079a529f

09e7d818

21 Dec, 2022 1 commit

Extract libsox integration from libtorchaudio (#2929) · 1706a72f

moto authored Dec 21, 2022

Summary:
This commit makes the following changes to the C++ library organization
- Move sox-related feature implementations from `libtorchaudio` to `libtorchaudio_sox`.
- Remove C++ implementation of `is_sox_available` and `is_ffmpeg_available` as it is now sufficient to check the existence of `libtorchaudio_sox` and `libtorchaudio_ffmpeg` to check the availability. This makes `libtorchaudio_sox` and `libtorchaudio_ffmpeg` independent from `libtorchaudio`.
- Move PyBind11-based bindings (`_torchaudio_sox`, `_torchaudio_ffmpeg`) into `torchaudio.lib` so that the built library structure is less cluttered.

Background:
Originally, when the `libsox` was the only C++ extension and `libtorchaudio` was supposed to contain all the C++ code.
The things are different now. We have a bunch of C++ extensions and we need to make the code/build structure more modular.

The new `libtorchaudio_sox` contains the implementations and `_torchaudio_sox` contains the PyBin11-based bindings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2929

Reviewed By: hwangjeff

Differential Revision: D42159594

Pulled By: mthrok

fbshipit-source-id: 1a0fbca9e4143137f6363fc001b2378ce6029aa7

1706a72f

12 Aug, 2022 1 commit

Introducing pytorch-cuda metapackage (#2612) · 776cf099

Andrey Talman authored Aug 12, 2022

Summary:
Introducing pytorch-cuda metapackage

Same as: https://github.com/pytorch/vision/pull/6371
Following PR: https://github.com/pytorch/builder/pull/1094
Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2612

Reviewed By: hwangjeff, seemethere, nateanl

Differential Revision: D38633332

Pulled By: atalman

fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501

776cf099

29 Jul, 2022 1 commit

Enable CTC decoder in Windows (#2587) · 67cb420d

moto authored Jul 29, 2022

Summary:
This commit enables CTC decoder on Windows.

The functionality seems to work fine.
The tests are passing, the decoding tutorial runs fine.

The only difference to the Linux/macOS version is that
loading model in XZ compression format is not supported.

![289961785_399620772041679_7768117002438616376_n](https://user-images.githubusercontent.com/855818/181420923-cfbd8402-20de-4e63-b9e4-e39f9aa9fc50.png)

Pull Request resolved: https://github.com/pytorch/audio/pull/2587

Reviewed By: carolineechen, nateanl

Differential Revision: D38276490

Pulled By: mthrok

fbshipit-source-id: f2203b2235c5bbb0220fe560aaaf0e1d5530347a

67cb420d

28 Jul, 2022 1 commit

Migrate CTC decoder code (#2580) · 39b6343d

moto authored Jul 28, 2022

Summary:
This commit gets rid of our copy of CTC decoder code and
replace it with upstream Flashlight-Text repo.

Pull Request resolved: https://github.com/pytorch/audio/pull/2580

Reviewed By: carolineechen

Differential Revision: D38244906

Pulled By: mthrok

fbshipit-source-id: d274240fc67675552d19ff35e9a363b9b9048721

39b6343d

02 Jun, 2022 1 commit

Remove mad (#2428) · d2ecba98

moto authored Jun 02, 2022

Summary:
Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354

In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad.
This commit completely removes libmad from torchaudio.

This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg.
The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2428

Reviewed By: carolineechen

Differential Revision: D36851805

Pulled By: mthrok

fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55

d2ecba98

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

13 May, 2022 1 commit

Move Streamer API out of prototype (#2378) · 72b712a1

moto authored May 13, 2022

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328

72b712a1

28 Apr, 2022 1 commit

Add BUILD_MAD option and default to OFF (#2354) · a71e3a40

moto authored Apr 28, 2022

Summary:
libmad integration should be enabled only from source-build

Pull Request resolved: https://github.com/pytorch/audio/pull/2354

Reviewed By: nateanl

Differential Revision: D36012035

Pulled By: mthrok

fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7

a71e3a40

30 Dec, 2021 1 commit

Add a switch to build ffmpeg binding (#2048) · ece03edc

moto authored Dec 30, 2021

Summary:
This PR adds `BUILD_FFMPEG` switch to torchaudio build process so that features related to ffmpeg are built.
The flag is false by default, so no CI jobs or development flow are affected.

This is because handling the dependencies around ffmpeg is a bit tricky.
Currently, the CMake file uses `pkg-config` to find an ffmpeg installation in the system.
This works fine for both conda-based installation and system-managed installation (like `apt`).

In subsequent PRs, I will find a solution that works for local development and binary distributions.

Pull Request resolved: https://github.com/pytorch/audio/pull/2048

Reviewed By: hwangjeff, nateanl

Differential Revision: D33367260

Pulled By: mthrok

fbshipit-source-id: 94517acecb62bd6d4e96d4b7cbc3ab3c2a25706c

ece03edc

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

18 Dec, 2021 1 commit

Add FL Decoder / KenLM integration to build process (#2078) · 246dd52a

moto authored Dec 18, 2021

Summary:
After all the C++ code from https://github.com/pytorch/audio/issues/2072 are added, this commit will enable decoder/KenLM integration in the build process.

Pull Request resolved: https://github.com/pytorch/audio/pull/2078

Reviewed By: carolineechen

Differential Revision: D33198183

Pulled By: mthrok

fbshipit-source-id: 9d7fa76151d06fbbac3785183c7c2ff9862d3128

246dd52a

17 Dec, 2021 1 commit

Add static build of KenLM (#2076) · adc559a8

moto authored Dec 17, 2021

Summary:
Add KenLM and its dependencies required for static build (`zlib`, `bzip2`, `lzma` and `boost-thread`).

The KenLM and its dependencies are build but since no corresponding code on torchaudio side is changed, the resulting torchaudio extension module is not changed. (therefore, as long as build process passes on CI this PR should be good to go.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2076

Reviewed By: carolineechen

Differential Revision: D33189980

Pulled By: mthrok

fbshipit-source-id: 6096113128b939f3cf70990c99aacc4aaa954584

adc559a8

30 Nov, 2021 1 commit

Allow whitespace as TORCH_CUDA_ARCH_LIST delimiter (#2050) · e83d4177

moto authored Nov 30, 2021

Summary:
Resolves https://github.com/pytorch/audio/issues/2049, https://github.com/pytorch/audio/issues/1940

Pull Request resolved: https://github.com/pytorch/audio/pull/2050

Reviewed By: nateanl

Differential Revision: D32712513

Pulled By: mthrok

fbshipit-source-id: e1db81786bcca67605ff765d27e0527e20967d1c

e83d4177

06 Oct, 2021 2 commits
- Add OpenMP support (#1761) · e3734fef
  moto authored Oct 06, 2021
  
  e3734fef
- Rename build_tools to tools (#1812) · 181f0c80
  moto authored Oct 05, 2021
  
  181f0c80