1. 28 Mar, 2023 2 commits
  2. 27 Mar, 2023 2 commits
    • hwangjeff's avatar
      Revise encoder config arg and docstrings (#3203) · b1de9f1a
      hwangjeff authored
      Summary:
      For `StreamWriter`,
      * Renames arg `config` to codec_config`.
      * Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
      * Adds docstrings for arg codec_config`.
      * Updates `chunk` to `frames` in `write_*_chunk` methods.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3203
      
      Reviewed By: mthrok
      
      Differential Revision: D44350153
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343
      b1de9f1a
    • Moto Hira's avatar
      Refactor the initialization of EncodeProcess (#3205) · 4eac61a3
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3205
      
      This commit refactors the initialization of EncodeProcess.
      
      Interface-wise, the signature of the constructor of EncodeProcess
      has made simpler just to take rvalues of its components, and the
      initialization of the components have been moved to helper functions.
      
      Implementat-wise, the order that the components are initialized is
      revised, and the source of initialization parameters is also revised.
      
      For example, the original implementation first creates AVCodecContext,
      and passes it around to create the other components. This relied on
      an assumption that parameters AVCodecContext has (such as image size
      and sample rate) are same as the source data. This is not always right,
      and as we will introduce custom filter graph and allow on-the-fly
      transform of rates and dimensions, it will become even less correct.
      
      The new initialization constructs source AVFrame, TensorConverter and
      FilterGraph from source attributes. This makes it easy to introduce
      on-the-fly transform.
      
      Reviewed By: nateanl
      
      Differential Revision: D44360650
      
      fbshipit-source-id: bf0e77dc1a5a40fc8e9870c50d07339d812762e8
      4eac61a3
  3. 25 Mar, 2023 1 commit
    • moto's avatar
      Properly set #samples passed to encoder (#3204) · d8a37a21
      moto authored
      Summary:
      Some audio encoders expect specific, exact number of samples described as in `AVCodecContext.frame_size`.
      
      The `AVFrame.nb_samples` is set for the frames passed to `AVFilterGraph`,
      but frames coming out of the graph do not necessarily have the same numbr of frames.
      
      This causes issues with encoding OPUS (among others).
      
      This commit fixes it by inserting `asetnsamples` to filter graph if a fixed number of samples is requested.
      
      Note:
      It turned out that FFmpeg 4.1 has issue with OPUS encoding. It does not properly discard some sample.
      We should probably move the minimum required FFmpeg to 4.2, but I am not sure if we can enforce it via ABI.
      Work around will be to issue an warning if encoding OPUS with 4.1. (follow-up)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3204
      
      Reviewed By: nateanl
      
      Differential Revision: D44374668
      
      Pulled By: mthrok
      
      fbshipit-source-id: 10ef5333dc0677dfb83c8e40b78edd8ded1b21dc
      d8a37a21
  4. 23 Mar, 2023 6 commits
  5. 22 Mar, 2023 2 commits
    • moto's avatar
      Fix oscillator bank test (#3196) · aa590a1b
      moto authored
      Summary:
      Follow up of https://github.com/pytorch/audio/pull/3083
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3196
      
      Reviewed By: nateanl
      
      Differential Revision: D44308940
      
      Pulled By: mthrok
      
      fbshipit-source-id: e3ef27656e74c28ae78b767517d8e0ba3a9ac4a6
      aa590a1b
    • atalman's avatar
      [Nova] Windows Adopt ffmpeg build to be executed from github actions (#3193) · da86fbc7
      atalman authored
      Summary:
      Adopt ffmpeg build to be executed from github actions for windows
      
      Tested by manually invoking this script:
      ```
      c:\actions-runner\_work\test-infra\test-infra\pytorch\audio
      Chocolatey v1.2.1
      Installing the following packages:
      msys2
      By installing, you accept licenses for the packages.
      msys2 v20230318.0.0 already installed.
       Use --force to reinstall, specify a version to install, or try upgrade.
      
      Chocolatey installed 0/1 packages.
       See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
      
      Warnings:
       - msys2 - msys2 v20230318.0.0 already installed.
       Use --force to reinstall, specify a version to install, or try upgrade.
      
      Did you know the proceeds of Pro (and some proceeds from other
       licensed editions) go into bettering the community infrastructure?
       Your support ensures an active community, keeps Chocolatey tip-top,
       plus it nets you some awesome features!
       https://chocolatey.org/compare
      warning: base-devel-...
      da86fbc7
  6. 21 Mar, 2023 6 commits
  7. 20 Mar, 2023 3 commits
    • Moto Hira's avatar
      Refactor StreamReader internals (#3184) · c17226a0
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3184
      
      Tweak internals of StreamReader
      1. Pass time_base to Buffer class so that
          * no need to pass frame_duration separately
          * Conversion of PTS to double type can be delayed until when it's popped
      2. Merge `get_output_timebase` method into `get_output_stream_info`.
      3. If filter description is not provided, fill in null filter at top-level StreamReader
      4. Expose filer and filter description from Sink class to get rid of wrapper get methods.
      
      Reviewed By: nateanl
      
      Differential Revision: D44207976
      
      fbshipit-source-id: f25ac9be69c9897e9dcec0c6e978f29b83b166e8
      c17226a0
    • Moto Hira's avatar
      Fix GPU memory leak on StreamReader (#3186) · 9533d300
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3186
      
      Fix the GPU memory leak introduced in https://github.com/pytorch/audio/pull/3183
      
      The HW frames context is owned by AVCodecContext.
      The removed `av_buffer_ref` call increased the ferenrence counting unnecessarily,
      and prevented AVCodecContext from feeing the resource.
      
      (Note: this ignores all push blocking failures!)
      
      Reviewed By: nateanl
      
      Differential Revision: D44231876
      
      fbshipit-source-id: 9be2c33049dd02a3fa82a85271de7fb62e5b09ea
      9533d300
    • moto's avatar
      Support CUDA frame in FilterGraph (#3183) · c5b96558
      moto authored
      Summary:
      This commit adds CUDA frame support to FilterGraph
      
      It initializes and attaches CUDA frames context to FilterGraph,
      so that CUDA frames can be processed in FilterGraph.
      
      As a result, it enables
      1. CUDA filter support such as `scale_cuda`
      2. Properly retrieve the pixel format coming out of FilterGraph when
         CUDA HW acceleration is enabled. (currently it is reported as "cuda")
      
      Resolves https://github.com/pytorch/audio/issues/3159
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3183
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44183722
      
      Pulled By: mthrok
      
      fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f
      c5b96558
  8. 17 Mar, 2023 4 commits
  9. 16 Mar, 2023 2 commits
    • jiyuntu-eero's avatar
      Fix initialization of `get_trellis`. (#3172) · a6b34a5d
      jiyuntu-eero authored
      Summary:
      Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3172
      
      Reviewed By: mthrok
      
      Differential Revision: D44090889
      
      Pulled By: nateanl
      
      fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264
      a6b34a5d
    • moto's avatar
      Refactor Tensor conversion in StreamReader (#3170) · 014d7140
      moto authored
      Summary:
      Currently, when the Buffer converts AVFrame* to torch::Tensor,
      it checks the format at each time a frame is passed, and
      perform the conversion.
      
      This commit changes it so that the conversion operation is
      pre-instantiated at the time outside stream is configured.
      
      It introduces Converter implementations for various formats,
      and use template to embed them in Buffer class.
      This way, branching like if/switch are eliminated from
      decoding path.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3170
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D44048293
      
      Pulled By: mthrok
      
      fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
      014d7140
  10. 15 Mar, 2023 2 commits
    • Carl Parker's avatar
      Enhance UX on TorchAudio pages to improve awareness of doc versioning (#3167) · 92f2ea89
      Carl Parker authored
      Summary:
      - Boldface the version-selection UX and increase size by three percent.
      - Add text to breadcrumbs to indicate version and stability.
      - New `breadcrumbs.html` in `_templates` overrides Sphinx version.
      
      I create a new variable in `conf.py`, **version_stable**, which has the version number for the most-recent stable release. I define this variable in the **html_context** dictionary so that it is visible to the templates.
      
      I use this approach because I was not able to find any other way of discerning the current stable release during the build. Note that the `versions.html` file--which identifies the current stable release--appears to be available only in the **gh-pages** branch and so it is not available at build time.
      
      However, this means that someone will need to update `conf.py` whenever the current stable release changes.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3167
      
      Reviewed By: mthrok
      
      Differential Revision: D44112224
      
      Pulled By: carljparker
      
      fbshipit-source-id: e76f5cb6734a784d161342964459577aa9b64cac
      92f2ea89
    • Zhaoheng Ni's avatar
      Fix MFCC autograd test (#3169) · ee0b97f2
      Zhaoheng Ni authored
      Summary:
      Autograd test randomly fails for MFCC transform. Fix it by increasing `nondet_tol` to `1e-10`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3169
      
      Reviewed By: xiaohui-zhang, mthrok
      
      Differential Revision: D44069673
      
      Pulled By: nateanl
      
      fbshipit-source-id: addafefe381104e778b09bfbaafb322df1d9054c
      ee0b97f2
  11. 14 Mar, 2023 2 commits
  12. 09 Mar, 2023 2 commits
    • Moto Hira's avatar
      Refactor StreamReader - let StreamProcessor own codec context (#3157) · a8f4e97b
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3157
      
      AVCodecContext plays central role in decoding and encoding.
      Currently in StreamReader, the object is owned inside of Decoder class
      and it's not accessible from other objects.
      
      This commit move the ownership of AVCodecContext out of Decoder to
      StreamProcessor class so that other components can check access its field.
      
      Also, the Decoder class, which is super thin wrapper around AVCodecContext
      object, is now absorbed to StreamProcessor class.
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D43924664
      
      fbshipit-source-id: e53254955d9ce16871e393bcd8bb2794ce6a51ff
      a8f4e97b
    • Moto Hira's avatar
      Remove private helper methods from StreamReader (#3156) · 430dd17c
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3156
      
      Remove helper methods that are not worthy of being private method
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D43919385
      
      fbshipit-source-id: 2ce4efaf5ec9418076e78c7ce1f842e0dd7e3028
      430dd17c
  13. 08 Mar, 2023 3 commits
    • cai525's avatar
      Fix documentation of functional and transforms (#3134) · 85cb37e2
      cai525 authored
      Summary:
      Address #3101. The documentation for `power=1` should represent magnitude instead of energy.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3134
      
      Reviewed By: mthrok
      
      Differential Revision: D43910652
      
      Pulled By: nateanl
      
      fbshipit-source-id: e0768438e819222a5dde6b86c5123ab0e8af59fb
      85cb37e2
    • moto's avatar
      Include format information after filter (#3155) · 146195d8
      moto authored
      Summary:
      This commit adds fields to OutputStream, which shows the result
      of fitlers, such as width and height after filtering.
      
      Before
      
      ```
      OutputStream(
          source_index=0,
          filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
      ```
      
      After
      
      ```
      OutputVideoStream(
          source_index=0,
          filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
          media_type='video',
          format='gray',
          width=320,
          height=320,
          frame_rate=3.0)
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3155
      
      Reviewed By: nateanl
      
      Differential Revision: D43882399
      
      Pulled By: mthrok
      
      fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d
      146195d8
    • moto's avatar
      Support overwriting PTS in StreamWriter (#3135) · 8d2f6f8d
      moto authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3135
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D43724273
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9b52823618948945a26e57d5b3deccbf5f9268c1
      8d2f6f8d
  14. 07 Mar, 2023 3 commits