Commit 72b712a1 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Move Streamer API out of prototype (#2378)

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
parent 9499f642
......@@ -254,7 +254,7 @@ jobs:
export FFMPEG_ROOT=${PWD}/third_party/ffmpeg
packaging/build_wheel.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: dist
- persist_to_workspace:
......@@ -277,7 +277,7 @@ jobs:
export FFMPEG_ROOT=${PWD}/third_party/ffmpeg
packaging/build_conda.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: /opt/conda/conda-bld/linux-64
- persist_to_workspace:
......@@ -305,7 +305,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
packaging/build_wheel.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: dist
- persist_to_workspace:
......@@ -331,7 +331,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
packaging/build_conda.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: /Users/distiller/miniconda3/conda-bld/osx-64
- persist_to_workspace:
......@@ -358,7 +358,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
bash packaging/build_wheel.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: dist
- persist_to_workspace:
......@@ -391,7 +391,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
bash packaging/build_conda.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: C:/tools/miniconda3/conda-bld/win-64
- persist_to_workspace:
......@@ -621,7 +621,7 @@ jobs:
command: .circleci/unittest/linux/scripts/install.sh
environment:
BUILD_MAD: true
BUILD_FFMPEG: true
USE_FFMPEG: true
- run:
name: Run tests
command: .circleci/unittest/linux/scripts/run_test.sh
......@@ -654,7 +654,7 @@ jobs:
command: docker run -t --gpus all -e PYTHON_VERSION -v $PWD:$PWD -w $PWD "${image_name}" .circleci/unittest/linux/scripts/setup_env.sh
- run:
name: Install torchaudio
command: docker run -t --gpus all -e UPLOAD_CHANNEL -e CONDA_CHANNEL_FLAGS -e BUILD_FFMPEG=1 -e BUILD_MAD=1 -v $PWD:$PWD -w $PWD "${image_name}" .circleci/unittest/linux/scripts/install.sh
command: docker run -t --gpus all -e UPLOAD_CHANNEL -e CONDA_CHANNEL_FLAGS -e USE_FFMPEG=1 -e BUILD_MAD=1 -v $PWD:$PWD -w $PWD "${image_name}" .circleci/unittest/linux/scripts/install.sh
- run:
name: Run tests
environment:
......@@ -681,7 +681,7 @@ jobs:
name: Install torchaudio
command: .circleci/unittest/windows/scripts/install.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- run:
name: Run tests
command: .circleci/unittest/windows/scripts/run_test.sh
......@@ -727,7 +727,7 @@ jobs:
name: Install torchaudio
command: .circleci/unittest/windows/scripts/install.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- run:
name: Run tests
command: .circleci/unittest/windows/scripts/run_test.sh
......@@ -766,7 +766,7 @@ jobs:
name: Install torchaudio
command: .circleci/unittest/linux/scripts/install.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
BUILD_MAD: true
- run:
name: Run tests
......
......@@ -254,7 +254,7 @@ jobs:
export FFMPEG_ROOT=${PWD}/third_party/ffmpeg
packaging/build_wheel.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: dist
- persist_to_workspace:
......@@ -277,7 +277,7 @@ jobs:
export FFMPEG_ROOT=${PWD}/third_party/ffmpeg
packaging/build_conda.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: /opt/conda/conda-bld/linux-64
- persist_to_workspace:
......@@ -305,7 +305,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
packaging/build_wheel.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: dist
- persist_to_workspace:
......@@ -331,7 +331,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
packaging/build_conda.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: /Users/distiller/miniconda3/conda-bld/osx-64
- persist_to_workspace:
......@@ -358,7 +358,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
bash packaging/build_wheel.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: dist
- persist_to_workspace:
......@@ -391,7 +391,7 @@ jobs:
export FFMPEG_ROOT="${PWD}/third_party/ffmpeg"
bash packaging/build_conda.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- store_artifacts:
path: C:/tools/miniconda3/conda-bld/win-64
- persist_to_workspace:
......@@ -621,7 +621,7 @@ jobs:
command: .circleci/unittest/linux/scripts/install.sh
environment:
BUILD_MAD: true
BUILD_FFMPEG: true
USE_FFMPEG: true
- run:
name: Run tests
command: .circleci/unittest/linux/scripts/run_test.sh
......@@ -654,7 +654,7 @@ jobs:
command: docker run -t --gpus all -e PYTHON_VERSION -v $PWD:$PWD -w $PWD "${image_name}" .circleci/unittest/linux/scripts/setup_env.sh
- run:
name: Install torchaudio
command: docker run -t --gpus all -e UPLOAD_CHANNEL -e CONDA_CHANNEL_FLAGS -e BUILD_FFMPEG=1 -e BUILD_MAD=1 -v $PWD:$PWD -w $PWD "${image_name}" .circleci/unittest/linux/scripts/install.sh
command: docker run -t --gpus all -e UPLOAD_CHANNEL -e CONDA_CHANNEL_FLAGS -e USE_FFMPEG=1 -e BUILD_MAD=1 -v $PWD:$PWD -w $PWD "${image_name}" .circleci/unittest/linux/scripts/install.sh
- run:
name: Run tests
environment:
......@@ -681,7 +681,7 @@ jobs:
name: Install torchaudio
command: .circleci/unittest/windows/scripts/install.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- run:
name: Run tests
command: .circleci/unittest/windows/scripts/run_test.sh
......@@ -727,7 +727,7 @@ jobs:
name: Install torchaudio
command: .circleci/unittest/windows/scripts/install.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
- run:
name: Run tests
command: .circleci/unittest/windows/scripts/run_test.sh
......@@ -766,7 +766,7 @@ jobs:
name: Install torchaudio
command: .circleci/unittest/linux/scripts/install.sh
environment:
BUILD_FFMPEG: true
USE_FFMPEG: true
BUILD_MAD: true
- run:
name: Run tests
......
......@@ -32,7 +32,7 @@ jobs:
python -m pip install --quiet pytest requests cmake ninja deep-phonemizer sentencepiece
python setup.py install
env:
BUILD_FFMPEG: true
USE_FFMPEG: true
- name: Run integration test
run: |
cd test && pytest integration_tests -v --use-tmp-hub-dir
......@@ -58,11 +58,11 @@ endif()
# Options
option(BUILD_SOX "Build libsox statically" ON)
option(BUILD_MAD "Enable libmad" OFF)
option(BUILD_FFMPEG "Enable ffmpeg-based features" OFF)
option(BUILD_KALDI "Build kaldi statically" ON)
option(BUILD_RNNT "Enable RNN transducer" ON)
option(BUILD_CTC_DECODER "Build Flashlight CTC decoder" ON)
option(BUILD_TORCHAUDIO_PYTHON_EXTENSION "Build Python extension" OFF)
option(USE_FFMPEG "Enable ffmpeg-based features" OFF)
option(USE_CUDA "Enable CUDA support" OFF)
option(USE_ROCM "Enable ROCM support" OFF)
option(USE_OPENMP "Enable OpenMP support" OFF)
......
......@@ -39,6 +39,7 @@ API References
:caption: API Reference
torchaudio
io
backend
functional
transforms
......@@ -58,7 +59,6 @@ Prototype API References
:caption: Prototype API Reference
prototype
prototype.io
prototype.ctc_decoder
prototype.models
prototype.pipelines
......
torchaudio.io
=============
.. currentmodule:: torchaudio.io
StreamReader
------------
.. autoclass:: StreamReader
:members:
StreamReaderSourceStream
------------------------
.. autoclass:: StreamReaderSourceStream
:members:
StreamReaderSourceAudioStream
-----------------------------
.. autoclass:: StreamReaderSourceAudioStream
:members:
StreamReaderSourceVideoStream
-----------------------------
.. autoclass:: StreamReaderSourceVideoStream
:members:
StreamReaderOutputStream
------------------------
.. autoclass:: StreamReaderOutputStream
:members:
torchaudio.prototype.io
=======================
.. currentmodule:: torchaudio.prototype.io
SourceStream
------------
.. autoclass:: SourceStream
:members:
SourceAudioStream
-----------------
.. autoclass:: SourceAudioStream
:members:
SourceVideoStream
-----------------
.. autoclass:: SourceVideoStream
:members:
OutputStream
------------
.. autoclass:: OutputStream
:members:
Streamer
--------
.. autoclass:: Streamer
:members:
......@@ -17,7 +17,6 @@ imported explicitly, e.g.
import torchaudio.prototype.ctc_decoder
.. toctree::
prototype.io
prototype.ctc_decoder
prototype.models
prototype.pipelines
......@@ -10,15 +10,17 @@ on laptop.
.. note::
This tutorial requires prototype Streaming API, ffmpeg>=4.1, and SentencePiece.
This tutorial requires Streaming API, FFmpeg libraries (>=4.1, <5),
and SentencePiece.
Prototype features are not part of binary releases, but available in
nightly build. Please refer to https://pytorch.org for installing
nightly build.
The Streaming API is available in nightly build.
Please refer to https://pytorch.org/get-started/locally
for instructions.
There are multiple ways to install FFmpeg libraries.
If you are using Anaconda Python distribution,
``conda install -c anaconda ffmpeg`` will install
the required libraries.
``conda install 'ffmpeg<5'`` will install
the required FFmpeg libraries.
You can install SentencePiece by running ``pip install sentencepiece``.
......@@ -49,7 +51,7 @@ on laptop.
#
# Firstly, we need to check the devices that Streaming API can access,
# and figure out the arguments (``src`` and ``format``) we need to pass
# to :py:func:`~torchaudio.prototype.io.Streamer` class.
# to :py:func:`~torchaudio.io.StreamReader` class.
#
# We use ``ffmpeg`` command for this. ``ffmpeg`` abstracts away the
# difference of underlying hardware implementations, but the expected
......@@ -76,7 +78,7 @@ on laptop.
#
# .. code::
#
# Streamer(
# StreamReader(
# src = ":1", # no video, audio from device 1, "MacBook Pro Microphone"
# format = "avfoundation",
# )
......@@ -100,7 +102,7 @@ on laptop.
#
# .. code::
#
# Streamer(
# StreamReader(
# src = "audio=@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{BF2B8AE1-10B8-4CA4-A0DC-D02E18A56177}",
# format = "dshow",
# )
......@@ -134,10 +136,10 @@ NUM_ITER = 100
def stream(q, format, src, segment_length, sample_rate):
from torchaudio.prototype.io import Streamer
from torchaudio.io import StreamReader
print("Building Streamer...")
streamer = Streamer(src, format=format)
print("Building StreamReader...")
streamer = StreamReader(src, format=format)
streamer.add_basic_audio_stream(frames_per_chunk=segment_length, sample_rate=sample_rate)
print(streamer.get_src_stream_info(0))
......@@ -170,7 +172,7 @@ def stream(q, format, src, segment_length, sample_rate):
#
# For the detail of ``timeout`` and ``backoff`` parameters, please refer
# to the documentation of
# :py:meth:`~torchaudio.prototype.io.Streamer.stream` method.
# :py:meth:`~torchaudio.io.StreamReader.stream` method.
#
# .. note::
#
......@@ -324,7 +326,7 @@ if __name__ == "__main__":
# Sample rate: 16000
# Main segment: 2560 frames (0.16 seconds)
# Right context: 640 frames (0.04 seconds)
# Building Streamer...
# Building StreamReader...
# SourceAudioStream(media_type='audio', codec='pcm_f32le', codec_long_name='PCM 32-bit floating point little-endian', format='flt', bit_rate=1536000, sample_rate=48000.0, num_channels=1)
# OutputStream(source_index=0, filter_description='aresample=16000,aformat=sample_fmts=fltp')
# Streaming...
......
......@@ -13,22 +13,17 @@ to perform online speech recognition.
#
# .. note::
#
# This tutorial requires torchaudio with prototype features,
# FFmpeg libraries (>=4.1), and SentencePiece.
# This tutorial requires Streaming API, FFmpeg libraries (>=4.1, <5),
# and SentencePiece.
#
# torchaudio prototype features are available on nightly builds.
# The Streaming API is available in nightly builds.
# Please refer to https://pytorch.org/get-started/locally/
# for instructions.
#
# The interfaces of prototype features are unstable and subject to
# change. Please refer to `the nightly build documentation
# <https://pytorch.org/audio/main/>`__ for the up-to-date
# API references.
#
# There are multiple ways to install FFmpeg libraries.
# If you are using Anaconda Python distribution,
# ``conda install -c anaconda ffmpeg`` will install
# the required libraries.
# ``conda install 'ffmpeg<5'`` will install
# the required FFmpeg libraries.
#
# You can install SentencePiece by running ``pip install sentencepiece``.
......@@ -54,7 +49,7 @@ import torch
import torchaudio
try:
from torchaudio.prototype.io import Streamer
from torchaudio.io import StreamReader
except ModuleNotFoundError:
try:
import google.colab
......@@ -123,7 +118,7 @@ print(f"Right context: {context_length} frames ({context_length / sample_rate} s
# 4. Configure the audio stream
# -----------------------------
#
# Next, we configure the input audio stream using :py:func:`~torchaudio.prototype.io.Streamer`.
# Next, we configure the input audio stream using :py:func:`~torchaudio.io.StreamReader`.
#
# For the detail of this API, please refer to the
# `Media Stream API tutorial <./streaming_api_tutorial.html>`__.
......@@ -139,7 +134,7 @@ print(f"Right context: {context_length} frames ({context_length / sample_rate} s
#
src = "https://download.pytorch.org/torchaudio/tutorial-assets/greatpiratestories_00_various.mp3"
streamer = Streamer(src)
streamer = StreamReader(src)
streamer.add_basic_audio_stream(frames_per_chunk=segment_length, sample_rate=bundle.sample_rate)
print(streamer.get_src_stream_info(0))
......
......@@ -12,21 +12,15 @@ libavfilter provides.
#
# .. note::
#
# This tutorial requires torchaudio with prototype features and
# FFmpeg libraries (>=4.1).
# This tutorial requires Streaming API and FFmpeg libraries (>=4.1, <5).
#
# The torchaudio prototype features are available on nightly builds.
# The Streaming API is available in nightly builds.
# Please refer to https://pytorch.org/get-started/locally/
# for instructions.
#
# The interfaces of prototype features are unstable and subject to
# change. Please refer to `the nightly build documentation
# <https://pytorch.org/audio/main/>`__ for the up-to-date
# API references.
#
# There are multiple ways to install FFmpeg libraries.
# If you are using Anaconda Python distribution,
# ``conda install -c anaconda ffmpeg`` will install
# ``conda install -c anaconda 'ffmpeg<5'`` will install
# the required libraries.
#
......@@ -72,7 +66,7 @@ import torch
import torchaudio
try:
from torchaudio.prototype.io import Streamer
from torchaudio.io import StreamReader
except ModuleNotFoundError:
try:
import google.colab
......@@ -120,29 +114,29 @@ VIDEO_URL = f"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Obse
######################################################################
#
# To open a media file, you can simply pass the path of the file to
# the constructor of `Streamer`.
# the constructor of `StreamReader`.
#
# .. code::
#
# Streamer(src="audio.wav")
# StreamReader(src="audio.wav")
#
# Streamer(src="audio.mp3")
# StreamReader(src="audio.mp3")
#
# This works for image file, video file and video streams.
#
# .. code::
#
# # Still image
# Streamer(src="image.jpeg")
# StreamReader(src="image.jpeg")
#
# # Video file
# Streamer(src="video.mpeg")
# StreamReader(src="video.mpeg")
#
# # Video on remote server
# Streamer(src="https://example.com/video.mp4")
# StreamReader(src="https://example.com/video.mp4")
#
# # Playlist format
# Streamer(src="https://example.com/playlist.m3u")
# StreamReader(src="https://example.com/playlist.m3u")
#
# If attempting to load headerless raw data, you can use ``format`` and
# ``option`` to specify the format of the data.
......@@ -159,7 +153,7 @@ VIDEO_URL = f"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Obse
#
# .. code::
#
# Streamer(src="raw.s2", format="s16le", option={"sample_rate": "16000"})
# StreamReader(src="raw.s2", format="s16le", option={"sample_rate": "16000"})
#
######################################################################
......@@ -170,30 +164,30 @@ VIDEO_URL = f"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Obse
# the output streams.
#
# You can check the number of source streams with
# :py:attr:`~torchaudio.prototype.io.Streamer.num_src_streams`.
# :py:attr:`~torchaudio.io.StreamReader.num_src_streams`.
#
# .. note::
# The number of streams is NOT the number of channels.
# Each audio stream can contain an arbitrary number of channels.
#
# To check the metadata of source stream you can use
# :py:meth:`~torchaudio.prototype.io.Streamer.get_src_stream_info`
# :py:meth:`~torchaudio.io.StreamReader.get_src_stream_info`
# method and provide the index of the source stream.
#
# This method returns
# :py:class:`~torchaudio.prototype.io.SourceStream`. If a source
# :py:class:`~torchaudio.io.StreamReader.SourceStream`. If a source
# stream is audio type, then the return type is
# :py:class:`~torchaudio.prototype.io.SourceAudioStream`, which is
# :py:class:`~torchaudio.io.StreamReader.SourceAudioStream`, which is
# a subclass of `SourceStream`, with additional audio-specific attributes.
# Similarly, if a source stream is video type, then the return type is
# :py:class:`~torchaudio.prototype.io.SourceVideoStream`.
# :py:class:`~torchaudio.io.StreamReader.SourceVideoStream`.
######################################################################
# For regular audio formats and still image formats, such as `WAV`
# and `JPEG`, the number of souorce streams is 1.
#
streamer = Streamer(AUDIO_URL)
streamer = StreamReader(AUDIO_URL)
print("The number of source streams:", streamer.num_src_streams)
print(streamer.get_src_stream_info(0))
......@@ -203,7 +197,7 @@ print(streamer.get_src_stream_info(0))
#
src = "https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8"
streamer = Streamer(src)
streamer = StreamReader(src)
print("The number of source streams:", streamer.num_src_streams)
for i in range(streamer.num_src_streams):
print(streamer.get_src_stream_info(i))
......@@ -228,8 +222,8 @@ for i in range(streamer.num_src_streams):
# FFmpeg implements some heuristics to determine the default stream.
# The resulting stream index is exposed via
#
# :py:attr:`~torchaudio.prototype.io.Streamer.default_audio_stream` and
# :py:attr:`~torchaudio.prototype.io.Streamer.default_video_stream`.
# :py:attr:`~torchaudio.io.StreamReader.default_audio_stream` and
# :py:attr:`~torchaudio.io.StreamReader.default_video_stream`.
#
######################################################################
......@@ -238,8 +232,8 @@ for i in range(streamer.num_src_streams):
#
# Once you know which source stream you want to use, then you can
# configure output streams with
# :py:meth:`~torchaudio.prototype.io.Streamer.add_basic_audio_stream` and
# :py:meth:`~torchaudio.prototype.io.Streamer.add_basic_video_stream`.
# :py:meth:`~torchaudio.io.StreamReader.add_basic_audio_stream` and
# :py:meth:`~torchaudio.io.StreamReader.add_basic_video_stream`.
#
# These methods provide a simple way to change the basic property of
# media to match the application's requirements.
......@@ -253,15 +247,15 @@ for i in range(streamer.num_src_streams):
# For video, it will be
# `(frames_per_chunk, num_channels, height, width)`.
# - ``buffer_chunk_size``: The maximum number of chunks to be buffered internally.
# When the Streamer buffered this number of chunks and is asked to pull
# more frames, Streamer drops the old frames/chunks.
# When the StreamReader buffered this number of chunks and is asked to pull
# more frames, StreamReader drops the old frames/chunks.
# - ``stream_index``: The index of the source stream.
#
# For audio output stream, you can provide the following additional
# parameters to change the audio properties.
#
# - ``sample_rate``: When provided, Streamer resamples the audio on-the-fly.
# - ``dtype``: By default the Streamer returns tensor of `float32` dtype,
# - ``sample_rate``: When provided, StreamReader resamples the audio on-the-fly.
# - ``dtype``: By default the StreamReader returns tensor of `float32` dtype,
# with sample values ranging `[-1, 1]`. By providing ``dtype`` argument
# the resulting dtype and value range is changed.
#
......@@ -277,7 +271,7 @@ for i in range(streamer.num_src_streams):
#
# .. code::
#
# streamer = Streamer(...)
# streamer = StreamReader(...)
#
# # Stream audio from default audio source stream
# # 256 frames at a time, keeping the original sampling rate.
......@@ -324,9 +318,9 @@ for i in range(streamer.num_src_streams):
#
# You can check the resulting output streams in a similar manner as
# checking the source streams.
# :py:attr:`~torchaudio.prototype.io.Streamer.num_out_streams` reports
# :py:attr:`~torchaudio.io.StreamReader.num_out_streams` reports
# the number of configured output streams, and
# :py:meth:`~torchaudio.prototype.io.Streamer.get_out_stream_info`
# :py:meth:`~torchaudio.io.StreamReader.get_out_stream_info`
# fetches the information about the output streams.
#
# .. code::
......@@ -338,7 +332,7 @@ for i in range(streamer.num_src_streams):
######################################################################
#
# If you want to remove an output stream, you can do so with
# :py:meth:`~torchaudio.prototype.io.Streamer.remove_stream` method.
# :py:meth:`~torchaudio.io.StreamReader.remove_stream` method.
#
# .. code::
#
......@@ -355,16 +349,16 @@ for i in range(streamer.num_src_streams):
# audio / video data to client code.
#
# There are low-level methods that performs these operations.
# :py:meth:`~torchaudio.prototype.io.Streamer.is_buffer_ready`,
# :py:meth:`~torchaudio.prototype.io.Streamer.process_packet` and
# :py:meth:`~torchaudio.prototype.io.Streamer.pop_chunks`.
# :py:meth:`~torchaudio.io.StreamReader.is_buffer_ready`,
# :py:meth:`~torchaudio.io.StreamReader.process_packet` and
# :py:meth:`~torchaudio.io.StreamReader.pop_chunks`.
#
# In this tutorial, we will use the high-level API, iterator protocol.
# It is as simple as a ``for`` loop.
#
# .. code::
#
# streamer = Streamer(...)
# streamer = StreamReader(...)
# streamer.add_basic_audio_stream(...)
# streamer.add_basic_video_stream(...)
#
......@@ -404,7 +398,7 @@ for i in range(streamer.num_src_streams):
# Firstly, let's list the available streams and its properties.
#
streamer = Streamer(VIDEO_URL)
streamer = StreamReader(VIDEO_URL)
for i in range(streamer.num_src_streams):
print(streamer.get_src_stream_info(i))
......@@ -582,7 +576,7 @@ plt.show(block=False)
#
# .. code::
#
# >>> Streamer(
# >>> StreamReader(
# ... src="0:0", # The first 0 means `FaceTime HD Camera`, and
# ... # the second 0 indicates `MacBook Pro Microphone`.
# ... format="avfoundation",
......@@ -601,7 +595,7 @@ plt.show(block=False)
#
# .. code::
#
# >>> streamer = Streamer(
# >>> streamer = StreamReader(
# ... src="0:0",
# ... format="avfoundation",
# ... option={"framerate": "30", "pixel_format": "bgr0"},
......@@ -640,7 +634,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src="sine=sample_rate=8000:frequency=360", format="lavfi")
# StreamReader(src="sine=sample_rate=8000:frequency=360", format="lavfi")
#
# .. raw:: html
#
......@@ -661,7 +655,7 @@ plt.show(block=False)
# .. code::
#
# # 5 Hz binaural beats on a 360 Hz carrier
# Streamer(
# StreamReader(
# src=(
# 'aevalsrc='
# 'sample_rate=8000:'
......@@ -687,7 +681,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src="anoisesrc=color=pink:sample_rate=8000:amplitude=0.5", format="lavfi")
# StreamReader(src="anoisesrc=color=pink:sample_rate=8000:amplitude=0.5", format="lavfi")
#
# .. raw:: html
#
......@@ -711,7 +705,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src=f"cellauto", format="lavfi")
# StreamReader(src=f"cellauto", format="lavfi")
#
# .. raw:: html
#
......@@ -727,7 +721,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src=f"mandelbrot", format="lavfi")
# StreamReader(src=f"mandelbrot", format="lavfi")
#
# .. raw:: html
#
......@@ -743,7 +737,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src=f"mptestsrc", format="lavfi")
# StreamReader(src=f"mptestsrc", format="lavfi")
#
# .. raw:: html
#
......@@ -759,7 +753,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src=f"life", format="lavfi")
# StreamReader(src=f"life", format="lavfi")
#
# .. raw:: html
#
......@@ -775,7 +769,7 @@ plt.show(block=False)
#
# .. code::
#
# Streamer(src=f"sierpinski", format="lavfi")
# StreamReader(src=f"sierpinski", format="lavfi")
#
# .. raw:: html
#
......@@ -789,8 +783,8 @@ plt.show(block=False)
# ------------------------
#
# When defining an output stream, you can use
# :py:meth:`~torchaudio.prototype.io.Streamer.add_audio_stream` and
# :py:meth:`~torchaudio.prototype.io.Streamer.add_video_stream` methods.
# :py:meth:`~torchaudio.io.StreamReader.add_audio_stream` and
# :py:meth:`~torchaudio.io.StreamReader.add_video_stream` methods.
#
# These methods take ``filter_desc`` argument, which is a string
# formatted according to ffmpeg's
......@@ -852,7 +846,7 @@ descs = [
sample_rate = 8000
streamer = Streamer(AUDIO_URL)
streamer = StreamReader(AUDIO_URL)
for desc in descs:
streamer.add_audio_stream(
frames_per_chunk=40000,
......@@ -929,7 +923,7 @@ descs = [
######################################################################
#
streamer = Streamer(VIDEO_URL)
streamer = StreamReader(VIDEO_URL)
for desc in descs:
streamer.add_video_stream(
frames_per_chunk=30,
......
......@@ -46,12 +46,13 @@ build:
- BUILD_VERSION
- USE_CUDA
- TORCH_CUDA_ARCH_LIST
- BUILD_FFMPEG
- USE_FFMPEG
- FFMPEG_ROOT
test:
imports:
- torchaudio
- torchaudio.io
- torchaudio.datasets
- torchaudio.kaldi_io
- torchaudio.sox_effects
......
......@@ -117,7 +117,7 @@ def is_ffmpeg_available():
global _IS_FFMPEG_AVAILABLE
if _IS_FFMPEG_AVAILABLE is None:
try:
from torchaudio.prototype.io import Streamer # noqa: F401
from torchaudio.io import StreamReader # noqa: F401
_IS_FFMPEG_AVAILABLE = True
except Exception:
......
......@@ -14,11 +14,11 @@ from torchaudio_unittest.common_utils import (
)
if is_ffmpeg_available():
from torchaudio.prototype.io import (
Streamer,
SourceStream,
SourceVideoStream,
SourceAudioStream,
from torchaudio.io import (
StreamReader,
StreamReaderSourceStream,
StreamReaderSourceVideoStream,
StreamReaderSourceAudioStream,
)
......@@ -27,13 +27,13 @@ def get_video_asset(file="nasa_13013.mp4"):
@skipIfNoFFmpeg
class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
"""Test suite for interface behaviors around Streamer"""
class StreamReaderInterfaceTest(TempDirMixin, TorchaudioTestCase):
"""Test suite for interface behaviors around StreamReader"""
def test_streamer_invalid_input(self):
"""Streamer constructor does not segfault but raise an exception when the input is invalid"""
"""StreamReader constructor does not segfault but raise an exception when the input is invalid"""
with self.assertRaises(RuntimeError):
Streamer("foobar")
StreamReader("foobar")
@nested_params(
[
......@@ -46,20 +46,20 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
[{}, {"sample_rate": "16000"}],
)
def test_streamer_invalide_option(self, invalid_keys, options):
"""When invalid options are given, Streamer raises an exception with these keys"""
"""When invalid options are given, StreamReader raises an exception with these keys"""
options.update({k: k for k in invalid_keys})
src = get_video_asset()
with self.assertRaises(RuntimeError) as ctx:
Streamer(src, option=options)
StreamReader(src, option=options)
assert all(f'"{k}"' in str(ctx.exception) for k in invalid_keys)
def test_src_info(self):
"""`get_src_stream_info` properly fetches information"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
assert s.num_src_streams == 6
expected = [
SourceVideoStream(
StreamReaderSourceVideoStream(
media_type="video",
codec="h264",
codec_long_name="H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10",
......@@ -69,7 +69,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
height=180,
frame_rate=25.0,
),
SourceAudioStream(
StreamReaderSourceAudioStream(
media_type="audio",
codec="aac",
codec_long_name="AAC (Advanced Audio Coding)",
......@@ -78,14 +78,14 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
sample_rate=8000.0,
num_channels=2,
),
SourceStream(
StreamReaderSourceStream(
media_type="subtitle",
codec="mov_text",
codec_long_name="MOV text",
format=None,
bit_rate=None,
),
SourceVideoStream(
StreamReaderSourceVideoStream(
media_type="video",
codec="h264",
codec_long_name="H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10",
......@@ -95,7 +95,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
height=270,
frame_rate=29.97002997002997,
),
SourceAudioStream(
StreamReaderSourceAudioStream(
media_type="audio",
codec="aac",
codec_long_name="AAC (Advanced Audio Coding)",
......@@ -104,7 +104,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
sample_rate=16000.0,
num_channels=2,
),
SourceStream(
StreamReaderSourceStream(
media_type="subtitle",
codec="mov_text",
codec_long_name="MOV text",
......@@ -117,30 +117,30 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_src_info_invalid_index(self):
"""`get_src_stream_info` does not segfault but raise an exception when input is invalid"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
for i in [-1, 6, 7, 8]:
with self.assertRaises(IndexError):
s.get_src_stream_info(i)
def test_default_streams(self):
"""default stream is not None"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
assert s.default_audio_stream is not None
assert s.default_video_stream is not None
def test_default_audio_stream_none(self):
"""default audio stream is None for video without audio"""
s = Streamer(get_video_asset("nasa_13013_no_audio.mp4"))
s = StreamReader(get_video_asset("nasa_13013_no_audio.mp4"))
assert s.default_audio_stream is None
def test_default_video_stream_none(self):
"""default video stream is None for video with only audio"""
s = Streamer(get_video_asset("nasa_13013_no_video.mp4"))
s = StreamReader(get_video_asset("nasa_13013_no_video.mp4"))
assert s.default_video_stream is None
def test_num_out_stream(self):
"""num_out_streams gives the correct count of output streams"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
n, m = 6, 4
for i in range(n):
assert s.num_out_streams == i
......@@ -158,7 +158,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_basic_audio_stream(self):
"""`add_basic_audio_stream` constructs a correct filter."""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
s.add_basic_audio_stream(frames_per_chunk=-1, dtype=None)
s.add_basic_audio_stream(frames_per_chunk=-1, sample_rate=8000)
s.add_basic_audio_stream(frames_per_chunk=-1, dtype=torch.int16)
......@@ -177,7 +177,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_basic_video_stream(self):
"""`add_basic_video_stream` constructs a correct filter."""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
s.add_basic_video_stream(frames_per_chunk=-1, format=None)
s.add_basic_video_stream(frames_per_chunk=-1, width=3, height=5)
s.add_basic_video_stream(frames_per_chunk=-1, frame_rate=7)
......@@ -201,7 +201,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_remove_streams(self):
"""`remove_stream` removes the correct output stream"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
s.add_basic_audio_stream(frames_per_chunk=-1, sample_rate=24000)
s.add_basic_video_stream(frames_per_chunk=-1, width=16, height=16)
s.add_basic_audio_stream(frames_per_chunk=-1, sample_rate=8000)
......@@ -221,7 +221,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_remove_stream_invalid(self):
"""Attempt to remove invalid output streams raises IndexError"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
for i in range(-3, 3):
with self.assertRaises(IndexError):
s.remove_stream(i)
......@@ -235,7 +235,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_process_packet(self):
"""`process_packet` method returns 0 while there is a packet in source stream"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
# nasa_1013.mp3 contains 1023 packets.
for _ in range(1023):
code = s.process_packet()
......@@ -246,19 +246,19 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_pop_chunks_no_output_stream(self):
"""`pop_chunks` method returns empty list when there is no output stream"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
assert s.pop_chunks() == []
def test_pop_chunks_empty_buffer(self):
"""`pop_chunks` method returns None when a buffer is empty"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
s.add_basic_audio_stream(frames_per_chunk=-1)
s.add_basic_video_stream(frames_per_chunk=-1)
assert s.pop_chunks() == [None, None]
def test_pop_chunks_exhausted_stream(self):
"""`pop_chunks` method returns None when the source stream is exhausted"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
# video is 16.57 seconds.
# audio streams per 10 second chunk
# video streams per 20 second chunk
......@@ -284,14 +284,14 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_stream_empty(self):
"""`stream` fails when no output stream is configured"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
with self.assertRaises(RuntimeError):
next(s.stream())
def test_stream_smoke_test(self):
"""`stream` streams chunks fine"""
w, h = 256, 198
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
s.add_basic_audio_stream(frames_per_chunk=2000, sample_rate=8000)
s.add_basic_video_stream(frames_per_chunk=15, frame_rate=60, width=w, height=h)
for i, (achunk, vchunk) in enumerate(s.stream()):
......@@ -302,7 +302,7 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_seek(self):
"""Calling `seek` multiple times should not segfault"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
for i in range(10):
s.seek(i)
for _ in range(0):
......@@ -312,13 +312,13 @@ class StreamerInterfaceTest(TempDirMixin, TorchaudioTestCase):
def test_seek_negative(self):
"""Calling `seek` with negative value should raise an exception"""
s = Streamer(get_video_asset())
s = StreamReader(get_video_asset())
with self.assertRaises(ValueError):
s.seek(-1.0)
@skipIfNoFFmpeg
class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
class StreamReaderAudioTest(TempDirMixin, TorchaudioTestCase):
"""Test suite for audio streaming"""
def _get_reference_wav(self, sample_rate, channels_first=False, **kwargs):
......@@ -328,7 +328,7 @@ class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
return path, data
def _test_wav(self, path, original, dtype):
s = Streamer(path)
s = StreamReader(path)
s.add_basic_audio_stream(frames_per_chunk=-1, dtype=dtype)
s.process_all_packets()
(output,) = s.pop_chunks()
......@@ -357,7 +357,7 @@ class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
expected = torch.flip(original, dims=(0,))
s = Streamer(path)
s = StreamReader(path)
s.add_audio_stream(frames_per_chunk=-1, filter_desc="areverse")
s.process_all_packets()
(output,) = s.pop_chunks()
......@@ -372,7 +372,7 @@ class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
path, original = self._get_reference_wav(1, dtype=dtype, num_channels=num_channels, num_frames=30)
for t in range(10, 20):
expected = original[t:, :]
s = Streamer(path)
s = StreamReader(path)
s.add_audio_stream(frames_per_chunk=-1)
s.seek(float(t))
s.process_all_packets()
......@@ -383,7 +383,7 @@ class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
"""Calling `seek` after streaming is started should change the position properly"""
path, original = self._get_reference_wav(1, dtype="int16", num_channels=2, num_frames=30)
s = Streamer(path)
s = StreamReader(path)
s.add_audio_stream(frames_per_chunk=-1)
ts = list(range(20)) + list(range(20, 0, -1)) + list(range(20))
......@@ -409,7 +409,7 @@ class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
8000, dtype="int16", num_channels=num_channels, num_frames=num_frames, channels_first=False
)
s = Streamer(path)
s = StreamReader(path)
s.add_audio_stream(frames_per_chunk=frames_per_chunk, buffer_chunk_size=buffer_chunk_size)
i, outputs = 0, []
for (output,) in s.stream():
......@@ -422,7 +422,7 @@ class StreamerAudioTest(TempDirMixin, TorchaudioTestCase):
@skipIfNoFFmpeg
class StreamerImageTest(TempDirMixin, TorchaudioTestCase):
class StreamReaderImageTest(TempDirMixin, TorchaudioTestCase):
def _get_reference_png(self, width: int, height: int, grayscale: bool):
original = get_image(width, height, grayscale=grayscale)
path = self.get_temp_path("ref.png")
......@@ -430,7 +430,7 @@ class StreamerImageTest(TempDirMixin, TorchaudioTestCase):
return path, original
def _test_png(self, path, original, format=None):
s = Streamer(path)
s = StreamReader(path)
s.add_basic_video_stream(frames_per_chunk=-1, format=format)
s.process_all_packets()
(output,) = s.pop_chunks()
......@@ -456,7 +456,7 @@ class StreamerImageTest(TempDirMixin, TorchaudioTestCase):
path, original = self._get_reference_png(w, h, grayscale=False)
expected = torch.flip(original, dims=(index,))[None, ...]
s = Streamer(path)
s = StreamReader(path)
s.add_video_stream(frames_per_chunk=-1, filter_desc=filter_desc)
s.process_all_packets()
output = s.pop_chunks()[0]
......
......@@ -37,7 +37,7 @@ _BUILD_MAD = _get_build("BUILD_MAD", False)
_BUILD_KALDI = False if platform.system() == "Windows" else _get_build("BUILD_KALDI", True)
_BUILD_RNNT = _get_build("BUILD_RNNT", True)
_BUILD_CTC_DECODER = False if platform.system() == "Windows" else _get_build("BUILD_CTC_DECODER", True)
_BUILD_FFMPEG = _get_build("BUILD_FFMPEG", False)
_USE_FFMPEG = _get_build("USE_FFMPEG", False)
_USE_ROCM = _get_build("USE_ROCM", torch.cuda.is_available() and torch.version.hip is not None)
_USE_CUDA = _get_build("USE_CUDA", torch.cuda.is_available() and torch.version.hip is None)
_USE_OPENMP = _get_build("USE_OPENMP", True) and "ATen parallel backend: OpenMP" in torch.__config__.parallel_info()
......@@ -56,7 +56,7 @@ def get_ext_modules():
Extension(name="torchaudio._torchaudio_decoder", sources=[]),
]
)
if _BUILD_FFMPEG:
if _USE_FFMPEG:
modules.append(Extension(name="torchaudio.lib.libtorchaudio_ffmpeg", sources=[]))
return modules
......@@ -97,7 +97,6 @@ class CMakeBuild(build_ext):
f"-DPython_INCLUDE_DIR={distutils.sysconfig.get_python_inc()}",
f"-DBUILD_SOX:BOOL={'ON' if _BUILD_SOX else 'OFF'}",
f"-DBUILD_MAD:BOOL={'ON' if _BUILD_MAD else 'OFF'}",
f"-DBUILD_FFMPEG:BOOL={'ON' if _BUILD_FFMPEG else 'OFF'}",
f"-DBUILD_KALDI:BOOL={'ON' if _BUILD_KALDI else 'OFF'}",
f"-DBUILD_RNNT:BOOL={'ON' if _BUILD_RNNT else 'OFF'}",
f"-DBUILD_CTC_DECODER:BOOL={'ON' if _BUILD_CTC_DECODER else 'OFF'}",
......@@ -105,6 +104,7 @@ class CMakeBuild(build_ext):
f"-DUSE_ROCM:BOOL={'ON' if _USE_ROCM else 'OFF'}",
f"-DUSE_CUDA:BOOL={'ON' if _USE_CUDA else 'OFF'}",
f"-DUSE_OPENMP:BOOL={'ON' if _USE_OPENMP else 'OFF'}",
f"-DUSE_FFMPEG:BOOL={'ON' if _USE_FFMPEG else 'OFF'}",
]
build_args = ["--target", "install"]
# Pass CUDA architecture to cmake
......
from torchaudio import _extension # noqa: F401
from torchaudio import (
io,
compliance,
datasets,
functional,
......@@ -22,6 +23,7 @@ except ImportError:
pass
__all__ = [
"io",
"compliance",
"datasets",
"functional",
......
......@@ -170,7 +170,7 @@ endif()
################################################################################
# libtorchaudio_ffmpeg
################################################################################
if(BUILD_FFMPEG)
if(USE_FFMPEG)
set(
LIBTORCHAUDIO_FFMPEG_SOURCES
ffmpeg/prototype.cpp
......
_INITIALIZED = False
_LAZILY_IMPORTED = [
"StreamReader",
"StreamReaderSourceStream",
"StreamReaderSourceAudioStream",
"StreamReaderSourceVideoStream",
"StreamReaderOutputStream",
]
def _init_extension():
import torch
import torchaudio
try:
torchaudio._extension._load_lib("libtorchaudio_ffmpeg")
except OSError as err:
raise ImportError(
"Stream API requires FFmpeg libraries (libavformat and such). Please install FFmpeg 4."
) from err
try:
torch.ops.torchaudio.ffmpeg_init()
except RuntimeError as err:
raise RuntimeError(
"Stream API requires FFmpeg binding. Please set USE_FFMPEG=1 when building from source."
) from err
global _INITIALIZED
_INITIALIZED = True
def __getattr__(name: str):
if name in _LAZILY_IMPORTED:
if not _INITIALIZED:
_init_extension()
from . import _stream_reader
item = getattr(_stream_reader, name)
globals()[name] = item
return item
raise AttributeError(f"module {__name__} has no attribute {name}")
def __dir__():
return sorted(__all__ + _LAZILY_IMPORTED)
__all__ = []
......@@ -8,8 +8,8 @@ import torchaudio
@dataclass
class SourceStream:
"""SourceStream()
class StreamReaderSourceStream:
"""StreamReaderSourceStream()
The metadata of a source stream. This class is used when representing streams of
media type other than `audio` or `video`.
......@@ -58,8 +58,8 @@ class SourceStream:
@dataclass
class SourceAudioStream(SourceStream):
"""SourceAudioStream()
class StreamReaderSourceAudioStream(StreamReaderSourceStream):
"""StreamReaderSourceAudioStream()
The metadata of an audio source stream.
......@@ -75,8 +75,8 @@ class SourceAudioStream(SourceStream):
@dataclass
class SourceVideoStream(SourceStream):
"""SourceVideoStream()
class StreamReaderSourceVideoStream(StreamReaderSourceStream):
"""StreamReaderSourceVideoStream()
The metadata of a video source stream.
......@@ -114,7 +114,7 @@ def _parse_si(i):
codec_name = i[_CODEC]
codec_long_name = i[_CODEC_LONG]
if media_type == "audio":
return SourceAudioStream(
return StreamReaderSourceAudioStream(
media_type,
codec_name,
codec_long_name,
......@@ -124,7 +124,7 @@ def _parse_si(i):
i[_NUM_CHANNELS],
)
if media_type == "video":
return SourceVideoStream(
return StreamReaderSourceVideoStream(
media_type,
codec_name,
codec_long_name,
......@@ -134,14 +134,14 @@ def _parse_si(i):
i[_HEIGHT],
i[_FRAME_RATE],
)
return SourceStream(media_type, codec_name, codec_long_name, None, None)
return StreamReaderSourceStream(media_type, codec_name, codec_long_name, None, None)
@dataclass
class OutputStream:
class StreamReaderOutputStream:
"""OutputStream()
Output stream configured on :py:class:`Streamer`.
Output stream configured on :py:class:`StreamReader`.
"""
source_index: int
......@@ -151,10 +151,10 @@ class OutputStream:
def _parse_oi(i):
return OutputStream(i[0], i[1])
return StreamReaderOutputStream(i[0], i[1])
class Streamer:
class StreamReader:
"""Fetch and decode audio/video streams chunk by chunk.
For the detailed usage of this class, please refer to the tutorial.
......@@ -239,7 +239,7 @@ class Streamer:
"""
return self._i_video
def get_src_stream_info(self, i: int) -> torchaudio.prototype.io.SourceStream:
def get_src_stream_info(self, i: int) -> torchaudio.io.StreamReaderSourceStream:
"""Get the metadata of source stream
Args:
......@@ -249,7 +249,7 @@ class Streamer:
"""
return _parse_si(torch.ops.torchaudio.ffmpeg_streamer_get_src_stream_info(self._s, i))
def get_out_stream_info(self, i: int) -> torchaudio.prototype.io.OutputStream:
def get_out_stream_info(self, i: int) -> torchaudio.io.StreamReaderOutputStream:
"""Get the metadata of output stream
Args:
......@@ -278,7 +278,7 @@ class Streamer:
"""Add output audio stream
Args:
frames_per_chunk (int): Number of frames returned by Streamer as a chunk.
frames_per_chunk (int): Number of frames returned by StreamReader as a chunk.
If the source stream is exhausted before enough frames are buffered,
then the chunk is returned as-is.
......@@ -314,7 +314,7 @@ class Streamer:
"""Add output video stream
Args:
frames_per_chunk (int): Number of frames returned by Streamer as a chunk.
frames_per_chunk (int): Number of frames returned by StreamReader as a chunk.
If the source stream is exhausted before enough frames are buffered,
then the chunk is returned as-is.
......@@ -361,7 +361,7 @@ class Streamer:
"""Add output audio stream
Args:
frames_per_chunk (int): Number of frames returned by Streamer as a chunk.
frames_per_chunk (int): Number of frames returned by StreamReader as a chunk.
If the source stream is exhausted before enough frames are buffered,
then the chunk is returned as-is.
......@@ -408,7 +408,7 @@ class Streamer:
"""Add output video stream
Args:
frames_per_chunk (int): Number of frames returned by Streamer as a chunk.
frames_per_chunk (int): Number of frames returned by StreamReader as a chunk.
If the source stream is exhausted before enough frames are buffered,
then the chunk is returned as-is.
......@@ -446,7 +446,7 @@ class Streamer:
Example - HW decoding::
>>> # Decode video with NVDEC, create Tensor on CPU.
>>> streamer = Streamer(src="input.mp4")
>>> streamer = StreamReader(src="input.mp4")
>>> streamer.add_video_stream(10, decoder="h264_cuvid", hw_accel=None)
>>>
>>> chunk, = next(streamer.stream())
......@@ -454,7 +454,7 @@ class Streamer:
... cpu
>>> # Decode video with NVDEC, create Tensor directly on CUDA
>>> streamer = Streamer(src="input.mp4")
>>> streamer = StreamReader(src="input.mp4")
>>> streamer.add_video_stream(10, decoder="h264_cuvid", hw_accel="cuda:1")
>>>
>>> chunk, = next(streamer.stream())
......@@ -462,7 +462,7 @@ class Streamer:
... cuda:1
>>> # Decode and resize video with NVDEC, create Tensor directly on CUDA
>>> streamer = Streamer(src="input.mp4")
>>> streamer = StreamReader(src="input.mp4")
>>> streamer.add_video_stream(
>>> 10, decoder="h264_cuvid",
>>> decoder_options={"resize": "240x360"}, hw_accel="cuda:1")
......@@ -595,10 +595,10 @@ class Streamer:
Arguments:
timeout (float or None, optional): See
:py:func:`~Streamer.process_packet`. (Default: ``None``)
:py:func:`~StreamReader.process_packet`. (Default: ``None``)
backoff (float, optional): See
:py:func:`~Streamer.process_packet`. (Default: ``10.0``)
:py:func:`~StreamReader.process_packet`. (Default: ``10.0``)
Returns:
Iterator[Tuple[Optional[torch.Tensor], ...]]:
......
_INITIALIZED = False
_LAZILY_IMPORTED = [
"Streamer",
"SourceStream",
"SourceAudioStream",
"SourceVideoStream",
"OutputStream",
]
def _init_extension():
import torch
import torchaudio
try:
torchaudio._extension._load_lib("libtorchaudio_ffmpeg")
except OSError as err:
raise ImportError(
"Stream API requires FFmpeg libraries (libavformat and such). Please install FFmpeg 4."
) from err
try:
torch.ops.torchaudio.ffmpeg_init()
except RuntimeError as err:
raise RuntimeError(
"Stream API requires FFmpeg binding. Please set BUILD_FFMPEG=1 when building from source."
) from err
global _INITIALIZED
_INITIALIZED = True
def __getattr__(name: str):
if name in _LAZILY_IMPORTED:
if not _INITIALIZED:
_init_extension()
if name == "Streamer":
import warnings
from torchaudio.io import StreamReader
from . import streamer
warnings.warn(
f"{__name__}.{name} has been moved to torchaudio.io.StreamReader. Please use torchaudio.io.StreamReader",
DeprecationWarning,
)
item = getattr(streamer, name)
globals()[name] = item
return item
global Streamer
Streamer = StreamReader
return Streamer
raise AttributeError(f"module {__name__} has no attribute {name}")
def __dir__():
return sorted(__all__ + _LAZILY_IMPORTED)
__all__ = []
return ["Streamer"]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment