1. 07 Jun, 2023 1 commit
  2. 05 Jun, 2023 1 commit
  3. 03 Jun, 2023 1 commit
  4. 20 May, 2023 1 commit
  5. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  6. 05 Apr, 2023 1 commit
  7. 04 Apr, 2023 1 commit
  8. 14 Feb, 2023 1 commit
  9. 09 Feb, 2023 1 commit
  10. 23 Jan, 2023 1 commit
    • Nikita Shulga's avatar
      Tweak `USE_CUDA` detection (#3005) · 09e7d818
      Nikita Shulga authored
      Summary:
      We don't need the presence of physical HW to compile with CUDA.
      
      Likely one of the causes of  https://github.com/pytorch/audio/issues/2979 (i.e. in CircleCI builds USE_CUDA were defined by CI environment, so nobody ever checked the default, but this is not the case in Nova builds)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3005
      
      Test Plan:
      Check that `compute.cu` is mentioned in builds, for example see https://github.com/pytorch/audio/actions/runs/3990295262/jobs/6843771056#step:9:829
      ```
      [193/202] /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -DINCLUDE_KALDI -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dlibtorchaudio_EXPORTS -I/__w/audio/audio/pytorch/audio -I/__w/audio/audio/pytorch/audio/third_party/kaldi/src -I/__w/audio/audio/pytorch/audio/third_party/kaldi/submodule/src -isystem=/__w/_temp/conda_environment_3990295262/lib/python3.7/site-packages/torch/include -isystem=/__w/_temp/conda_environment_3990295262/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem=/usr/local/cuda-11.6/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_50,code=compute_50 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -Xcompiler=-fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 -MD -MT torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o -MF torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o.d -x cu -c /__w/audio/audio/pytorch/audio/torchaudio/csrc/rnnt/gpu/compute.cu -o torchaudio/csrc/CMakeFiles/libtorchaudio.dir/rnnt/gpu/compute.cu.o
      ```
      
      Reviewed By: mthrok
      
      Differential Revision: D42687455
      
      Pulled By: malfet
      
      fbshipit-source-id: c37ad58cc62439d1268865e9bf0bcb97079a529f
      09e7d818
  11. 21 Dec, 2022 1 commit
    • moto's avatar
      Extract libsox integration from libtorchaudio (#2929) · 1706a72f
      moto authored
      Summary:
      This commit makes the following changes to the C++ library organization
      - Move sox-related feature implementations from `libtorchaudio` to `libtorchaudio_sox`.
      - Remove C++ implementation of `is_sox_available` and `is_ffmpeg_available` as it is now sufficient to check the existence of `libtorchaudio_sox` and `libtorchaudio_ffmpeg` to check the availability. This makes `libtorchaudio_sox` and `libtorchaudio_ffmpeg` independent from `libtorchaudio`.
      - Move PyBind11-based bindings (`_torchaudio_sox`, `_torchaudio_ffmpeg`) into `torchaudio.lib` so that the built library structure is less cluttered.
      
      Background:
      Originally, when the `libsox` was the only C++ extension and `libtorchaudio` was supposed to contain all the C++ code.
      The things are different now. We have a bunch of C++ extensions and we need to make the code/build structure more modular.
      
      The new `libtorchaudio_sox` contains the implementations and `_torchaudio_sox` contains the PyBin11-based bindings.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2929
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42159594
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1a0fbca9e4143137f6363fc001b2378ce6029aa7
      1706a72f
  12. 12 Aug, 2022 1 commit
  13. 29 Jul, 2022 1 commit
  14. 28 Jul, 2022 1 commit
    • moto's avatar
      Migrate CTC decoder code (#2580) · 39b6343d
      moto authored
      Summary:
      This commit gets rid of our copy of CTC decoder code and
      replace it with upstream Flashlight-Text repo.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2580
      
      Reviewed By: carolineechen
      
      Differential Revision: D38244906
      
      Pulled By: mthrok
      
      fbshipit-source-id: d274240fc67675552d19ff35e9a363b9b9048721
      39b6343d
  15. 02 Jun, 2022 1 commit
    • moto's avatar
      Remove mad (#2428) · d2ecba98
      moto authored
      Summary:
      Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354
      
      In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad.
      This commit completely removes libmad from torchaudio.
      
      This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg.
      The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2428
      
      Reviewed By: carolineechen
      
      Differential Revision: D36851805
      
      Pulled By: mthrok
      
      fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55
      d2ecba98
  16. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  17. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  18. 28 Apr, 2022 1 commit
  19. 30 Dec, 2021 1 commit
    • moto's avatar
      Add a switch to build ffmpeg binding (#2048) · ece03edc
      moto authored
      Summary:
      This PR adds `BUILD_FFMPEG` switch to torchaudio build process so that features related to ffmpeg are built.
      The flag is false by default, so no CI jobs or development flow are affected.
      
      This is because handling the dependencies around ffmpeg is a bit tricky.
      Currently, the CMake file uses `pkg-config` to find an ffmpeg installation in the system.
      This works fine for both conda-based installation and system-managed installation (like `apt`).
      
      In subsequent PRs, I will find a solution that works for local development and binary distributions.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2048
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33367260
      
      Pulled By: mthrok
      
      fbshipit-source-id: 94517acecb62bd6d4e96d4b7cbc3ab3c2a25706c
      ece03edc
  20. 23 Dec, 2021 1 commit
  21. 18 Dec, 2021 1 commit
  22. 17 Dec, 2021 1 commit
    • moto's avatar
      Add static build of KenLM (#2076) · adc559a8
      moto authored
      Summary:
      Add KenLM and its dependencies required for static build (`zlib`, `bzip2`, `lzma` and `boost-thread`).
      
      The KenLM and its dependencies are build but since no corresponding code on torchaudio side is changed, the resulting torchaudio extension module is not changed. (therefore, as long as build process passes on CI this PR should be good to go.)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2076
      
      Reviewed By: carolineechen
      
      Differential Revision: D33189980
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6096113128b939f3cf70990c99aacc4aaa954584
      adc559a8
  23. 30 Nov, 2021 1 commit
  24. 06 Oct, 2021 2 commits