1. 15 May, 2022 1 commit
    • John Reese's avatar
      [codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc
      John Reese authored
      Summary:
      Applies new import merging and sorting from µsort v1.0.
      
      When merging imports, µsort will make a best-effort to move associated
      comments to match merged elements, but there are known limitations due to
      the diynamic nature of Python and developer tooling. These changes should
      not produce any dangerous runtime changes, but may require touch-ups to
      satisfy linters and other tooling.
      
      Note that µsort uses case-insensitive, lexicographical sorting, which
      results in a different ordering compared to isort. This provides a more
      consistent sorting order, matching the case-insensitive order used when
      sorting import statements by module name, and ensures that "frog", "FROG",
      and "Frog" always sort next to each other.
      
      For details on µsort's sorting and merging semantics, see the user guide:
      https://usort.readthedocs.io/en/stable/guide.html#sorting
      
      Reviewed By: lisroach
      
      Differential Revision: D36402214
      
      fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
      d62875cc
  2. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  3. 12 May, 2022 3 commits
    • moto's avatar
      Use module-level `__getattr__` to implement delayed initialization (#2377) · 9499f642
      moto authored
      Summary:
      This commit updates the lazy module initialization logic for
      `torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`.
      
      - The modules are importable regarless of optional dependencies.
      i.e. `import torchaudio.prototype.io` does not trigger the check for
      optional dependencies.
      
      - Optional dependencies are checked when the actual
      API is imported for the first time.
      i.e. `from torchaudio.prototype.io import Streamer` triggers the check
      for optional dependencies.
      
      The downside is that;
      
      - `import torchaudio.prototype.io.Streamer` no longer works.
      
      ## Details:
      
      Starting from Python 3.7, modules can bave `__getattr__` function,
      which serves as a fallback if the import mechanism cannot find the
      attribute.
      
      This can be used to implement lazy import.
      
      ```python
      def __getattr__(name):
          global pi
          if name == 'pi':
              import math
              pi = math.pi
              return pi
          raise AttributeError(...)
      ```
      
      Ref: https://twitter.com/raymondh/status/1094686528440168453
      
      The implementation performs lazy import for the APIs that work with
      external/optional dependencies. In addition, it also check if the
      binding is initialized only once.
      
      ## Why is this preferable approach?
      
      Previously, the optional dependencies were checked at the tiem module
      is imported;
      
      https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5
      
      As long as this module is in `prototype`, which we ask users to import
      explictly, users had control whether they want/do not want to install
      the optional dependencies.
      
      This approach only works for one optional dependencies per one module.
      Say, we add different I/O library as an optional dependency, we need to
      put all the APIs in dedicated submodule. This prevents us from having
      flat namespace.
      i.e. the I/O modules with multiple optional dependencies would look like
      
      ```python
      # Client code
      from torchaudio.io.foo import FooFeature
      from torchaudio.io.bar import BarFeature
      ```
      
      where the new approach would allow
      
      ```python
      #client code
      from torchaudio.io import FooFeature, BarFeature
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2377
      
      Reviewed By: nateanl
      
      Differential Revision: D36305603
      
      Pulled By: mthrok
      
      fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae
      9499f642
    • Zhaoheng Ni's avatar
      Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680
      Zhaoheng Ni authored
      Summary:
      - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
      This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
      - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2296
      
      Reviewed By: mthrok
      
      Differential Revision: D36323217
      
      Pulled By: nateanl
      
      fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
      09639680
    • John Reese's avatar
      [black][codemod] formatting changes from black 22.3.0 · 595dc5d3
      John Reese authored
      Summary:
      Applies the black-fbsource codemod with the new build of pyfmt.
      
      paintitblack
      
      Reviewed By: lisroach
      
      Differential Revision: D36324783
      
      fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
      595dc5d3
  4. 11 May, 2022 1 commit
    • moto's avatar
      Ignore TempDir clean up error (#2379) · f35ad461
      moto authored
      Summary:
      On CircleCI, Windows unittests are failing for Python 3.7 with
      `PermissionError` at the end of test when it cleans up temporary
      directory.
      
      According to the discussion https://github.com/python/cpython/issues/74168,
      this is caused by a known issue with `shutil.rmtree`.
      
      In the above thread it is advised to simply ignore the error as it
      is not guaranteed that temp directories are cleaned up.
      
      This commit follows the same path and simply ignore the error
      so that our CI gets back to green.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2379
      
      Reviewed By: carolineechen
      
      Differential Revision: D36305595
      
      Pulled By: mthrok
      
      fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5
      f35ad461
  5. 10 May, 2022 5 commits
    • hwangjeff's avatar
      Add ConvEmformer module (#2358) · 2c79b55a
      hwangjeff authored
      Summary:
      Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.
      
      Continuation of https://github.com/pytorch/audio/issues/2324.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2358
      
      Reviewed By: nateanl, xiaohui-zhang
      
      Differential Revision: D36137992
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
      2c79b55a
    • Zhaoheng Ni's avatar
      Fix return dtype in MVDR module (#2376) · 2f4eb4ac
      Zhaoheng Ni authored
      Summary:
      Address https://github.com/pytorch/audio/issues/2375
      The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2376
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36280851
      
      Pulled By: nateanl
      
      fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112
      2f4eb4ac
    • Zhaoheng Ni's avatar
      Add RTFMVDR module (#2368) · 4b021ae3
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
      The input arguments are:
      - multi-channel spectrum.
      - RTF vector of the target speech
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2368
      
      Reviewed By: carolineechen
      
      Differential Revision: D36214940
      
      Pulled By: nateanl
      
      fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
      4b021ae3
    • Zhaoheng Ni's avatar
      Add diagonal_loading optional to rtf_power (#2369) · da1e83cc
      Zhaoheng Ni authored
      Summary:
      When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.
      
      This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2369
      
      Reviewed By: carolineechen
      
      Differential Revision: D36204130
      
      Pulled By: nateanl
      
      fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420
      da1e83cc
    • Zhaoheng Ni's avatar
      Add SoudenMVDR module (#2367) · aed5eb88
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
      The input arguments are:
      - multi-channel spectrum.
      - PSD matrix of target speech.
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2367
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36198015
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
      aed5eb88
  6. 26 Apr, 2022 1 commit
  7. 18 Apr, 2022 1 commit
  8. 12 Apr, 2022 1 commit
    • hwangjeff's avatar
      Add Conformer RNN-T model prototype (#2322) · b0c8e239
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
      - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
      - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
      - Introduces tests for `conformer_rnnt_model`.
      - Adds docs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2322
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35565987
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
      b0c8e239
  9. 08 Apr, 2022 1 commit
    • moto's avatar
      Add devices/properties badges (#2321) · 72ae755a
      moto authored
      Summary:
      Add badges of supported properties and devices to functionals and transforms.
      
      This commit adds `.. devices::` and `.. properties::` directives to sphinx.
      
      APIs with these directives will have badges (based off of shields.io) which link to the
      page with description of these features.
      
      Continuation of https://github.com/pytorch/audio/issues/2316
      Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2321
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35489063
      
      Pulled By: mthrok
      
      fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
      72ae755a
  10. 01 Apr, 2022 1 commit
    • moto's avatar
      Loosen atol for melscale batch test for Windows (#2305) · d65a0f3e
      moto authored
      Summary:
      The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows.
      
      https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272
      
      ```
      >       self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol)
      E       AssertionError: Tensor-likes are not close!
      E
      E       Mismatched elements: 28 / 196608 (0.0%)
      E       Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed)
      E       Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed)
      ```
      
      The value of atol==1e-08 seems very strict but all the other batch
      consistency tests are passing.
      
      The violation is for very small number of samples, which looks
      suspicious, but I think it is okay to reduce it to `1e-06` for Windows.
      
      `1e-06` is still more strict than the majority of the comparison tests we have.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2305
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35298056
      
      Pulled By: mthrok
      
      fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
      d65a0f3e
  11. 31 Mar, 2022 2 commits
    • moto's avatar
      Randomize initial phase of sinusoid data in test (#2301) · c6c6b689
      moto authored
      Summary:
      This commit update `get_sinusoid` function in test utility so that
      when a multi channel is requested, non-primal channel have randomized
      initial phase.
      
      This adds some variety in test data which should not break the tests.
      Currently `get_sinusoid` returns identical waveforms for all the channels.
      This multi channel support was added just to mock the input data so that
      it is easy to test features with multi-channel inputs, so tests should not be
      expecting the all channels to be identical.
      
      When working on numerical parity, it is more useful if the raw waveforms
      are somewhat different.
      
      Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel
      <img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png">
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2301
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35291689
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd
      c6c6b689
    • moto's avatar
      Move Kaldi comp tests to corresponding module (#2303) · ec552b69
      moto authored
      Summary:
      Tests on `torchaudio.compliance.kaldi` were scattered at different places.
      This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/`
      directory.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2303
      
      Reviewed By: nateanl
      
      Differential Revision: D35288400
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46
      ec552b69
  12. 04 Mar, 2022 2 commits
    • moto's avatar
      Flush and reset internal state after seek (#2264) · 7e1afc40
      moto authored
      Summary:
      This commit adds the following behavior to `seek` so that `seek`
      works after a frame is decoded.
      
      1. Flush the decoder buffer.
      2. Recreate filter graphs (so that internal state is re-initialized)
      3. Discard the buffered tensor. (decoded chunks)
      
      Also it disallows negative values for seek timestamp.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2264
      
      Reviewed By: carolineechen
      
      Differential Revision: D34497826
      
      Pulled By: mthrok
      
      fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
      7e1afc40
    • moto's avatar
      Make Streamer fail if an invalid option is provided (#2263) · 04875eef
      moto authored
      Summary:
      `torchaudio.prototype.io.Streamer` class takes context dependant options
      as `option` argument in the form of mappings of strings.
      
      Currently there is no check if the provided options were valid for
      the given input.
      
      This commit adds the check and raise an error if an invalid erro is given.
      
      This is analogous to `ffmpeg` command error handling.
      
      ```
      $ ffmpeg -foo
      ...
      Unrecognized option 'foo'.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2263
      
      Reviewed By: hwangjeff
      
      Differential Revision: D34495111
      
      Pulled By: mthrok
      
      fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
      04875eef
  13. 26 Feb, 2022 2 commits
  14. 25 Feb, 2022 5 commits
  15. 24 Feb, 2022 1 commit
  16. 17 Feb, 2022 2 commits
    • Zhaoheng Ni's avatar
      Refactor batch consistency test in functional (#2245) · 9cf59e75
      Zhaoheng Ni authored
      Summary:
      In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests.
      This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2245
      
      Reviewed By: mthrok
      
      Differential Revision: D34273035
      
      Pulled By: nateanl
      
      fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056
      9cf59e75
    • Zhaoheng Ni's avatar
      Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15
      Zhaoheng Ni authored
      Summary:
      - Refactor the current `LibriSpeechRNNTModule`'s unit test.
      - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
      - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2240
      
      Reviewed By: mthrok
      
      Differential Revision: D34285195
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
      b5d77b15
  17. 16 Feb, 2022 2 commits
    • Zhaoheng Ni's avatar
      Refactor torchscript consistency test in functional (#2246) · 87d79889
      Zhaoheng Ni authored
      Summary:
      In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2246
      
      Reviewed By: carolineechen
      
      Differential Revision: D34273057
      
      Pulled By: nateanl
      
      fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b
      87d79889
    • Zhaoheng Ni's avatar
      Add complex dtype support in functional autograd test (#2244) · eeba91dc
      Zhaoheng Ni authored
      Summary:
      In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2244
      
      Reviewed By: mthrok
      
      Differential Revision: D34272998
      
      Pulled By: nateanl
      
      fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
      eeba91dc
  18. 15 Feb, 2022 1 commit
  19. 11 Feb, 2022 2 commits
  20. 09 Feb, 2022 2 commits
    • hwangjeff's avatar
      Clean up Emformer (#2207) · 87d7694d
      hwangjeff authored
      Summary:
      - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases.
      - Clarify expected input shapes in API documentation.
      - Adjust `infer` tests to reflect expected usage.
      - Add assertion for input shape for `infer`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2207
      
      Reviewed By: mthrok
      
      Differential Revision: D34101205
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
      87d7694d
    • hwangjeff's avatar
      Fix librosa calls (#2208) · e5d567c9
      hwangjeff authored
      Summary:
      Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2208
      
      Reviewed By: mthrok
      
      Differential Revision: D34099793
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
      e5d567c9
  21. 02 Feb, 2022 1 commit
  22. 01 Feb, 2022 2 commits
    • hwangjeff's avatar
      Move ASR features out of prototype (#2187) · aca5591c
      hwangjeff authored
      Summary:
      Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2187
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D33918092
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
      aca5591c
    • Caroline Chen's avatar
      Add CTC decoder timesteps (#2184) · d43ce015
      Caroline Chen authored
      Summary:
      add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2184
      
      Reviewed By: mthrok
      
      Differential Revision: D33905530
      
      Pulled By: carolineechen
      
      fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
      d43ce015