1. 22 Dec, 2021 1 commit
    • Joao Gomes's avatar
      Revert linting exemptions introduced in #2071 (#2087) · 575d221e
      Joao Gomes authored
      Summary:
      After discussing with Moto Hira, we decided to revert linting exemptions
      introduced previously in order to keep the entire audio project as formatted
      as possible, to reduce the time we spend on formatting discussion.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2087
      
      Reviewed By: mthrok
      
      Differential Revision: D33236949
      
      Pulled By: jdsgomes
      
      fbshipit-source-id: e13079f532c4534d8a168059b0ded6fa375fdecf
      575d221e
  2. 21 Dec, 2021 3 commits
    • Moto Hira's avatar
      Clean up CTC decoder bynding code (#2092) · 4c2edd21
      Moto Hira authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2092
      
      Reviewed By: carolineechen
      
      Differential Revision: D33169110
      
      fbshipit-source-id: e422ad93efe50b91f1ac5d572dc82768c1000c05
      4c2edd21
    • moto's avatar
      Update audio augmentation tutorial (#2082) · 3a03d8c0
      moto authored
      Summary:
      1. Reorder Audio display so that audios are playable from browser in doc
      2. Add link to function documentations
      
      https://470342-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2082
      
      Reviewed By: carolineechen
      
      Differential Revision: D33227725
      
      Pulled By: mthrok
      
      fbshipit-source-id: c7ee360b6f9b84c8e0a9b72193b98487d03b57ab
      3a03d8c0
    • moto's avatar
      Fix load behavior for 24-bit input (#2084) · 4554d242
      moto authored
      Summary:
      ## bug description
      
      When a 24 bits-par-sample audio is loaded via file-like object,
      the loaded Tensor is wrong. It was fine if the audio is loaded
      from local file.
      
      ## The cause of the bug
      
      The core of the sox's decoding mechanism is `sox_read` function,
      one of which parameter is the maximum number of samples to decode
      from the given buffer.
      
      https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]
      
      The `sox_read` function is called in what is called `drain` effect,
      callback and this callback receives output buffer and its size in
      byte. The previous implementation passed this size value as
      the argument of `sox_read` for the maximum number of samples to
      read. Since buffer size is larger than the number of samples fit in
      the buffer, `sox_read` function always consumed the entire
      buffer. (This behavior is not wrong except when the input is
      24 bits-per-sample and file-like object.)
      
      When the input is read from file-like object, inside of drain
      callback, new data are fetched via Python's `read` method and
      loaded on fixed-size memory region. The size of this memory region
      can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
      but the default value is 8096.
      
      If the input format is 24 bits-per-sample, the end of memory region
      does not necessarily correspond to the end of a valid sample.
      When `sox_read` consumes all the data in the buffer region, the data
      at the end introduces some unexpected values.
      This causes the aforementioned bug
      
      ## Fix
      
      Pass proper (better estimated) maximum number of samples decodable to
      `sox_read`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2084
      
      Reviewed By: carolineechen
      
      Differential Revision: D33236947
      
      Pulled By: mthrok
      
      fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
      4554d242
  3. 20 Dec, 2021 3 commits
  4. 18 Dec, 2021 1 commit
  5. 17 Dec, 2021 4 commits
  6. 15 Dec, 2021 1 commit
  7. 11 Dec, 2021 1 commit
  8. 10 Dec, 2021 3 commits
  9. 08 Dec, 2021 1 commit
  10. 07 Dec, 2021 1 commit
  11. 04 Dec, 2021 1 commit
  12. 03 Dec, 2021 5 commits
  13. 02 Dec, 2021 1 commit
    • moto's avatar
      Refactor the library loading mechanism (#2038) · 9114e636
      moto authored
      Summary:
      (This is a part of refactor series, followed up by https://github.com/pytorch/audio/issues/2039 and https://github.com/pytorch/audio/issues/2040.
      The goal is to make it easy to add a new library artifact alongside with `libtorchudio`, as in https://github.com/pytorch/audio/pull/2048/commits/4ced990849e60f6d19e87ae22819b04d1726648e https://github.com/pytorch/audio/issues/2048 .)
      
      We plan to add prototype/beta third party library integrations,
      which could be unstable. (segfault, missing dynamic library dependencies etc...)
      
      If we add such integrations into the existing libtorchaudio,
      in the worst case, it will prevent users from just `import torchaudio`.
      
      Instead, we would like to separate the prototype/beta integrations
      into separate libraries, so that such issues would not impact all users but
      users who attempt to use these prototytpe/beta features.
      
      Say, a prototype feature `foo` is added in `torchaudio.prototype.foo`.
      The following initialization procedure will achieve the above mechanism.
      
      1. Place the library file `libtorchaudio_foo` in `torchaudio/lib`.
      2. In `torchaudio.prototype.foo.__init__.py`, load the `libtorchaudio_foo`.
      
      Note:
      The approach will be slightly different for fbcode, because of how buck deploys
      C++ libraries and standardized environment, but the code change here is still
      applicable.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2038
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D32682900
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0f402a92a366fba8c2894a0fe01f47f8cdd51376
      9114e636
  14. 30 Nov, 2021 2 commits
  15. 24 Nov, 2021 3 commits
  16. 23 Nov, 2021 2 commits
  17. 22 Nov, 2021 3 commits
  18. 19 Nov, 2021 3 commits
  19. 18 Nov, 2021 1 commit
    • hwangjeff's avatar
      Add Emformer RNN-T model (#2003) · 78ce7010
      hwangjeff authored
      Summary:
      Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2003
      
      Reviewed By: nateanl
      
      Differential Revision: D32440879
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360
      78ce7010