1. 11 Apr, 2022 1 commit
    • moto's avatar
      Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959
      moto authored
      Summary:
      This commit makes the FFmpeg integration support FFmpeg 5.0
      
      In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
      so that they deal with constant version of `AVInputFormat`.
      
      > 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
      >  Constified the pointers to AVInputFormats and AVOutputFormats
      >  in AVFormatContext, avformat_alloc_output_context2(),
      >  av_find_input_format(), av_probe_input_format(),
      >  av_probe_input_format2(), av_probe_input_format3(),
      >  av_probe_input_buffer2(), av_probe_input_buffer(),
      >  avformat_open_input(), av_guess_format() and av_guess_codec().
      >  Furthermore, constified the AVProbeData in av_probe_input_format(),
      >  av_probe_input_format2() and av_probe_input_format3().
      
      https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2326
      
      Reviewed By: carolineechen
      
      Differential Revision: D35551380
      
      Pulled By: mthrok
      
      fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666
      bd319959
  2. 08 Apr, 2022 1 commit
    • moto's avatar
      Add devices/properties badges (#2321) · 72ae755a
      moto authored
      Summary:
      Add badges of supported properties and devices to functionals and transforms.
      
      This commit adds `.. devices::` and `.. properties::` directives to sphinx.
      
      APIs with these directives will have badges (based off of shields.io) which link to the
      page with description of these features.
      
      Continuation of https://github.com/pytorch/audio/issues/2316
      Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2321
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35489063
      
      Pulled By: mthrok
      
      fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
      72ae755a
  3. 06 Apr, 2022 2 commits
  4. 05 Apr, 2022 2 commits
  5. 04 Apr, 2022 2 commits
  6. 01 Apr, 2022 5 commits
    • Zhaoheng Ni's avatar
      Fix loading checkpoint in hubert preprocessing (#2310) · 87f0d198
      Zhaoheng Ni authored
      Summary:
      When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2310
      
      Reviewed By: mthrok
      
      Differential Revision: D35316903
      
      Pulled By: nateanl
      
      fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed
      87f0d198
    • moto's avatar
      Update GNU config files to support `arm64-apple` system (#2307) · 3ed39e15
      moto authored
      Summary:
      This commit
      1. Updates the config.guess and config.sub files and
      2. applies them to all the third party libraries that use them.
      
      This resolves the following build failure on M1 mac with newer SDK.
      
      On MacBookPro with M1 chip, with the recent OS update, something
      about the development environment has been changed (probably newer
      version of XCode) and the build stopeed working with the following
      errors from third party dependencies.
      
      ```
      checking build system type... Invalid configuration ‘arm64-apple-darwin20.0.0': machine ‘arm64-apple' not recognized
      ```
      
      note: config files are taken from https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2307
      
      Reviewed By: nateanl
      
      Differential Revision: D35318273
      
      Pulled By: mthrok
      
      fbshipit-source-id: 746ac51dd1816767aa78b88445f76a29acfd29e8
      3ed39e15
    • moto's avatar
      Put CONDA_PREFIX second priority of ffmpeg search path (#2312) · 6a418a89
      moto authored
      Summary:
      Change the cmake logic to search CONDA_PREFIX before falling back
      to the other default paths and system paths.
      
      1. FFMPEG_ROOT
      2. CONDA_PREFIX
      3. Other locations (Package managers and system paths)
      
      For users with regular conda installation, ffmpeg from conda should
      be picked automatically.
      If anyone wants to specify the ffmpeg, then can set FFMPEG_ROOT
      variable to the location of desired installation.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2312
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35317383
      
      Pulled By: mthrok
      
      fbshipit-source-id: 52aef8f3f7f0f8f1eaf7a89a2d1ccfb6265e2c50
      6a418a89
    • Moto Hira's avatar
      Refactor the internal of transforms module (#2309) · 72f9a4e3
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2309
      
      For upcoming improved Kaldi features which are comprised of multiple classes / functions, put all the transforms implementations in dedicated directory.
      
      Reviewed By: nateanl
      
      Differential Revision: D35303682
      
      fbshipit-source-id: 5bc8c07ef639683008c0f76ffe56e3941f772659
      72f9a4e3
    • moto's avatar
      Loosen atol for melscale batch test for Windows (#2305) · d65a0f3e
      moto authored
      Summary:
      The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows.
      
      https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272
      
      ```
      >       self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol)
      E       AssertionError: Tensor-likes are not close!
      E
      E       Mismatched elements: 28 / 196608 (0.0%)
      E       Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed)
      E       Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed)
      ```
      
      The value of atol==1e-08 seems very strict but all the other batch
      consistency tests are passing.
      
      The violation is for very small number of samples, which looks
      suspicious, but I think it is okay to reduce it to `1e-06` for Windows.
      
      `1e-06` is still more strict than the majority of the comparison tests we have.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2305
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35298056
      
      Pulled By: mthrok
      
      fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
      d65a0f3e
  7. 31 Mar, 2022 2 commits
    • moto's avatar
      Randomize initial phase of sinusoid data in test (#2301) · c6c6b689
      moto authored
      Summary:
      This commit update `get_sinusoid` function in test utility so that
      when a multi channel is requested, non-primal channel have randomized
      initial phase.
      
      This adds some variety in test data which should not break the tests.
      Currently `get_sinusoid` returns identical waveforms for all the channels.
      This multi channel support was added just to mock the input data so that
      it is easy to test features with multi-channel inputs, so tests should not be
      expecting the all channels to be identical.
      
      When working on numerical parity, it is more useful if the raw waveforms
      are somewhat different.
      
      Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel
      <img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png">
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2301
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35291689
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd
      c6c6b689
    • moto's avatar
      Move Kaldi comp tests to corresponding module (#2303) · ec552b69
      moto authored
      Summary:
      Tests on `torchaudio.compliance.kaldi` were scattered at different places.
      This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/`
      directory.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2303
      
      Reviewed By: nateanl
      
      Differential Revision: D35288400
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46
      ec552b69
  8. 30 Mar, 2022 3 commits
  9. 26 Mar, 2022 1 commit
  10. 25 Mar, 2022 3 commits
  11. 24 Mar, 2022 2 commits
  12. 22 Mar, 2022 3 commits
    • moto's avatar
      Revise the parameterization of third party libraries (#2282) · 7444f568
      moto authored
      Summary:
      Originally, the global property TORCHAUDIO_THIRD_PARTIES was introduced
      to handle the optional third party dependencies that can change based on
      the build config.
      
      After revising the CMake, it turned out this is not really necessary,
      as our torchaudio/csrc/CMakeLists.txt properly branches out for
      conditional dependencies. Rather we should leave the global scope untouched.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2282
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35059838
      
      Pulled By: mthrok
      
      fbshipit-source-id: ed3557eaa9a669e4466d64893beab5089eca78b8
      7444f568
    • moto's avatar
      Add download utility specialized for torchaudio (#2283) · 64b98521
      moto authored
      Summary:
      In recent updates, torchaudio added features that download assets/models from
      download.pytorch.org/torchaudio.
      
      To reduce the code duplication, the implementations uses utilities from
      ``torch.hub``, but still, there are patterns repeated in implementing
      the fetch mechanism, notably cache and local file path handling.
      
      This commit introduces the utility function that handles
      download/cache/local path management that can be used for
      fetching pre-trained model data.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2283
      
      Reviewed By: carolineechen
      
      Differential Revision: D35050469
      
      Pulled By: mthrok
      
      fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b
      64b98521
    • Hagen Wierstorf's avatar
      Fix calculation of SNR value in tutorial (#2285) · 8395fe65
      Hagen Wierstorf authored
      Summary:
      The calculation of the SNR in tha data augmentation examples seems to be wrong to me:
      
      ![image](https://user-images.githubusercontent.com/173624/159487032-c60470c6-ef8e-48a0-ad5e-a117fcb8d606.png)
      
      If we start from the definition of the signal-to-noise ratio using the root mean square value we get:
      
      ```
      SNR = 20 log10 ( rms(scale * speech) / rms(noise) )
      ```
      this can be transformed to
      ```
      scale = 10^(SNR/20) rms(noise) / rms(speech)
      ```
      In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have
      ```
      rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2)
      ```
      this would lead us to:
      ```
      10^(SNR/20) = e^(SNR / 10)
      ```
      which is not true.
      
      Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`.
      
      For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2285
      
      Reviewed By: nateanl
      
      Differential Revision: D35047737
      
      Pulled By: mthrok
      
      fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3
      8395fe65
  13. 17 Mar, 2022 1 commit
  14. 10 Mar, 2022 3 commits
  15. 08 Mar, 2022 1 commit
  16. 06 Mar, 2022 1 commit
  17. 04 Mar, 2022 2 commits
    • moto's avatar
      Flush and reset internal state after seek (#2264) · 7e1afc40
      moto authored
      Summary:
      This commit adds the following behavior to `seek` so that `seek`
      works after a frame is decoded.
      
      1. Flush the decoder buffer.
      2. Recreate filter graphs (so that internal state is re-initialized)
      3. Discard the buffered tensor. (decoded chunks)
      
      Also it disallows negative values for seek timestamp.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2264
      
      Reviewed By: carolineechen
      
      Differential Revision: D34497826
      
      Pulled By: mthrok
      
      fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
      7e1afc40
    • moto's avatar
      Make Streamer fail if an invalid option is provided (#2263) · 04875eef
      moto authored
      Summary:
      `torchaudio.prototype.io.Streamer` class takes context dependant options
      as `option` argument in the form of mappings of strings.
      
      Currently there is no check if the provided options were valid for
      the given input.
      
      This commit adds the check and raise an error if an invalid erro is given.
      
      This is analogous to `ffmpeg` command error handling.
      
      ```
      $ ffmpeg -foo
      ...
      Unrecognized option 'foo'.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2263
      
      Reviewed By: hwangjeff
      
      Differential Revision: D34495111
      
      Pulled By: mthrok
      
      fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
      04875eef
  18. 27 Feb, 2022 1 commit
  19. 26 Feb, 2022 3 commits
    • Moto Hira's avatar
      Enable ffmpeg prototyep unit test (#2261) · 955ffb47
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2261
      
      Enables prototype ffmpeg io tests in fbcode.
      
      Reviewed By: nateanl
      
      Differential Revision: D33698353
      
      fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036
      955ffb47
    • Zhaoheng Ni's avatar
      Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4
      Zhaoheng Ni authored
      Summary:
      This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
      The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
      The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2232
      
      Reviewed By: mthrok
      
      Differential Revision: D34474561
      
      Pulled By: nateanl
      
      fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
      9c56ffb4
    • moto's avatar
      Improve device streaming (#2202) · 365313ed
      moto authored
      Summary:
      This commit adds tutorial for device ASR, and update API for device streaming.
      
      The changes for the interface are
      1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
      2. Move `fill_buffer` method to private.
      
      When dealing with device stream, there are situations where the device buffer is not
      ready and the system returns `EAGAIN`. In such case, the previous implementation of
      `process_packet` method raised an exception in Python layer , but for device ASR,
      this is inefficient. A better approach is to retry within C++ layer in blocking manner.
      The new `timeout` parameter serves this purpose.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2202
      
      Reviewed By: nateanl
      
      Differential Revision: D34475829
      
      Pulled By: mthrok
      
      fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
      365313ed
  20. 25 Feb, 2022 1 commit