1. 12 Oct, 2023 1 commit
  2. 09 Oct, 2023 1 commit
  3. 02 Oct, 2023 1 commit
  4. 13 Sep, 2023 1 commit
  5. 30 Aug, 2023 1 commit
  6. 19 Aug, 2023 1 commit
  7. 02 Jun, 2023 1 commit
    • moto's avatar
      [BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5
      moto authored
      Summary:
      This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.
      
      Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.
      
      The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.
      
      Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.
      
      See some of the discussion https://github.com/pytorch/audio/issues/1269
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3368
      
      Differential Revision: D46406176
      
      Pulled By: mthrok
      
      fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
      5bbbb1d5
  8. 20 May, 2023 1 commit
  9. 14 Feb, 2023 1 commit
  10. 04 Feb, 2023 1 commit
  11. 29 Dec, 2022 2 commits
  12. 27 Dec, 2022 1 commit
    • moto's avatar
      Refactor Buffer implementation in StreamReader (#2939) · 4699ef21
      moto authored
      Summary:
      The `Buffer` class is responsible for converting `AVFrame` into `torch::Tensor` and storing the frames in accordance to `frames_per_chunk` and `buffer_chunk_size`.
      
      There are four operating modes of Buffer; [audio|video] x [chunked|unchunked]. Audio and video have a separate class implementations, but the behavior of chunked/unchunked depends on `frames_per_chunk<0` or not.
      
      Chunked mode is where frames should be returned by chunk of a unit number frames, while unchunked mode is where frames are returned as-is.
      
      When frames are accumulated, in chunked mode, old frames are dropped, while in unchunked mode all the frames are retained.
      
      Currently, the underlying buffer implementations are the same `std::dequeu<torch::Tensor>`. As we plan to make chunked-mode behavior more efficient by changing the underlying buffer container, it will be easier if the unchuked-mode behavior is kept as-is as a separate class.
      
      This commit makes the following changes.
      
      * Change `Buffer` class into pure virtual class (interface).
      * Split `AudioBuffer` into` UnchunkedAudioBuffer` and `ChunkedAudioBuffer`.
      * Split `VideoBuffer` into` UnchunkedVideoBuffer` and `ChunkedVideoBuffer`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2939
      
      Reviewed By: carolineechen
      
      Differential Revision: D42247509
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7363e442a5b2db5dcbaaf0ffbfa702e088726d1b
      4699ef21
  13. 21 Dec, 2022 1 commit
    • moto's avatar
      Extract libsox integration from libtorchaudio (#2929) · 1706a72f
      moto authored
      Summary:
      This commit makes the following changes to the C++ library organization
      - Move sox-related feature implementations from `libtorchaudio` to `libtorchaudio_sox`.
      - Remove C++ implementation of `is_sox_available` and `is_ffmpeg_available` as it is now sufficient to check the existence of `libtorchaudio_sox` and `libtorchaudio_ffmpeg` to check the availability. This makes `libtorchaudio_sox` and `libtorchaudio_ffmpeg` independent from `libtorchaudio`.
      - Move PyBind11-based bindings (`_torchaudio_sox`, `_torchaudio_ffmpeg`) into `torchaudio.lib` so that the built library structure is less cluttered.
      
      Background:
      Originally, when the `libsox` was the only C++ extension and `libtorchaudio` was supposed to contain all the C++ code.
      The things are different now. We have a bunch of C++ extensions and we need to make the code/build structure more modular.
      
      The new `libtorchaudio_sox` contains the implementations and `_torchaudio_sox` contains the PyBin11-based bindings.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2929
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42159594
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1a0fbca9e4143137f6363fc001b2378ce6029aa7
      1706a72f
  14. 21 Sep, 2022 1 commit
  15. 01 Sep, 2022 1 commit
  16. 24 Aug, 2022 1 commit
    • moto's avatar
      Add StreamWriter (#2628) · 72404de9
      moto authored
      Summary:
      This commit adds FFmpeg-based encoder StreamWriter class.
      StreamWriter is pretty much the opposite of StreamReader class, and
      it supports;
      
      * Encoding audio / still image / video
      * Exporting to local file / streaming protocol / devices etc...
      * File-like object support (in later commit)
      * HW video encoding (in later commit)
      
      See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2628
      
      Reviewed By: nateanl
      
      Differential Revision: D38816650
      
      Pulled By: mthrok
      
      fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8
      72404de9
  17. 19 Aug, 2022 1 commit
    • Moto Hira's avatar
      Refactor sox pybind source code (#2636) · 789adf07
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2636
      
      At the early stage of torchaudio extension module,
      `torchaudio/csrc/pybind` directory was created so that
      all the code defining Python interface would be placed
      there and there will be only one extension module called
      `torchaudio._torchaudio`.
      
      However, the codebase has been evolved in a way separate
      extensions are defined for each feature (third party
      dependency) for the sake of more moduler file organization.
      
      What is left in `csrc/pybind` is libsox Python bindings.
      This commit moves it under `csrc/sox`.
      
      Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.
      
      Reviewed By: carolineechen
      
      Differential Revision: D38829253
      
      fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d
      789adf07
  18. 28 Jul, 2022 2 commits
  19. 08 Jul, 2022 1 commit
  20. 27 Jun, 2022 1 commit
  21. 01 Jun, 2022 1 commit
  22. 28 May, 2022 1 commit
    • moto's avatar
      Update I/O initialization (#2417) · 65ab62e6
      moto authored
      Summary:
      Attempt to load ffmpeg extension at the top level import
      
      Preparation to use ffmpeg-based I/O as a fallback for sox_io backend.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2417
      
      Reviewed By: carolineechen
      
      Differential Revision: D36736989
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0beb6f459313b5ea91597393ccb12571444c54d9
      65ab62e6
  23. 27 May, 2022 1 commit
    • moto's avatar
      Refactor Streamer to StreamReader in C++ codebase (#2403) · 9ef6c23d
      moto authored
      Summary:
      * `Streamer` has been renamed to `StreamReader` when it was moved from prototype to beta.
      This commit applies the same name change to the C++ source code.
      
      * Fix miscellaneous lint issues
      
      * Make the code compilable on FFmpeg 5
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2403
      
      Reviewed By: carolineechen
      
      Differential Revision: D36613053
      
      Pulled By: mthrok
      
      fbshipit-source-id: 69fedd6720d488dadf4dfe7d375ee76d216b215d
      9ef6c23d
  24. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  25. 19 May, 2022 1 commit
    • moto's avatar
      Refactor Streamer implementation (#2402) · eed57534
      moto authored
      Summary:
      * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
      * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
      * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†
      
      † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.
      
      Ref https://github.com/pytorch/audio/issues/2400
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2402
      
      Reviewed By: nateanl
      
      Differential Revision: D36488808
      
      Pulled By: mthrok
      
      fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
      eed57534
  26. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  27. 06 May, 2022 1 commit
    • moto's avatar
      Use custom FFmpeg libraries for torchaudio binary distributions (#2355) · b7624c60
      moto authored
      Summary:
      This commit changes the way torchaudio binary distributions are built.
      
      * For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries.
      * The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL.
      * The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment.
      * The torchaudio binary build process will use them to bootstrap its build process.
      * The custom FFmpeg libraries are NOT shipped.
      
      This commit also add disclaimer about FFmpeg in README.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2355
      
      Reviewed By: nateanl
      
      Differential Revision: D36202087
      
      Pulled By: mthrok
      
      fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef
      b7624c60
  28. 26 Apr, 2022 1 commit
  29. 22 Mar, 2022 1 commit
    • moto's avatar
      Revise the parameterization of third party libraries (#2282) · 7444f568
      moto authored
      Summary:
      Originally, the global property TORCHAUDIO_THIRD_PARTIES was introduced
      to handle the optional third party dependencies that can change based on
      the build config.
      
      After revising the CMake, it turned out this is not really necessary,
      as our torchaudio/csrc/CMakeLists.txt properly branches out for
      conditional dependencies. Rather we should leave the global scope untouched.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2282
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35059838
      
      Pulled By: mthrok
      
      fbshipit-source-id: ed3557eaa9a669e4466d64893beab5089eca78b8
      7444f568
  30. 15 Feb, 2022 1 commit
    • moto's avatar
      Improve ffmpeg library discovery (#2204) · 963905e4
      moto authored
      Summary:
      This commit fixes the issue with ffmpeg discovery at build time.
      The original implementation had issues like.
      
      1. Wrong usage of FindFFMPEG, which caused mixture of ffmpeg libraries from system directory and user directory.
      2. The optional `FFMPEG_ROOT` variable was not set within cmake.
      
      The issue 1 is problematic when a user does not have a permission to
      modify the environment. For example, an old version of ffmpeg, which is
      installed in a directory managed by the system (such as `/usr/local/lib`),
      then there is no way to specify a path in which user installs a supported version
      of ffmpeg.
      
      This commit changes the behavior by first searching the library
      in `FFMPEG_ROOT` environment variables, then
      resorting to the original behavior of searching the custom paths with
      system default path.
      
      Also this commirt removes support for `libavresample`, which is deprecated in
      ffmpeg 4 and removed in ffmpeg 5.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2204
      
      Reviewed By: carolineechen
      
      Differential Revision: D34225769
      
      Pulled By: mthrok
      
      fbshipit-source-id: 95b0bfaaef31e2e69e6df29f789010f48a48210b
      963905e4
  31. 27 Jan, 2022 1 commit
    • Caroline Chen's avatar
      Add no lm support for CTC decoder (#2174) · 4c3fa875
      Caroline Chen authored
      Summary:
      Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2174
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33798674
      
      Pulled By: carolineechen
      
      fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
      4c3fa875
  32. 02 Jan, 2022 1 commit
  33. 30 Dec, 2021 1 commit
    • moto's avatar
      Add a switch to build ffmpeg binding (#2048) · ece03edc
      moto authored
      Summary:
      This PR adds `BUILD_FFMPEG` switch to torchaudio build process so that features related to ffmpeg are built.
      The flag is false by default, so no CI jobs or development flow are affected.
      
      This is because handling the dependencies around ffmpeg is a bit tricky.
      Currently, the CMake file uses `pkg-config` to find an ffmpeg installation in the system.
      This works fine for both conda-based installation and system-managed installation (like `apt`).
      
      In subsequent PRs, I will find a solution that works for local development and binary distributions.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2048
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33367260
      
      Pulled By: mthrok
      
      fbshipit-source-id: 94517acecb62bd6d4e96d4b7cbc3ab3c2a25706c
      ece03edc
  34. 20 Dec, 2021 1 commit
  35. 18 Dec, 2021 1 commit
  36. 17 Dec, 2021 1 commit
  37. 04 Dec, 2021 1 commit
  38. 03 Dec, 2021 1 commit
    • moto's avatar
      Clean up libtorchaudio customization logic (#2039) · a401dcb8
      moto authored
      Summary:
      (See https://github.com/pytorch/audio/issues/2038 description for the overall goal.)
      This PR cleans up CMake customization logic for `libtorchaudio`.
      
      It introduces base variables LIBTORCHAUDIO_INCLUDE_DIRS,
      LIBTORCHAUDIO_LINK_LIBRARIES and LIBTORCHAUDIO_COMPILE_DEFINITIONS,
      which are respectively used when calling `target_include_directories`,
      `target_link_libraries` and `target_compile_definitions`.
      
      The customization logic only modifies these variables.
      
      The original implementation called these functions multiple times
      (once par customization logic) and it is getting difficult to understand
      the customization logic.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2039
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D32683004
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4d41f21692ac139b1185a6ab69eb45d881ee7e73
      a401dcb8