Simplify the extension initialization process (#1734)

Calling `torch.[ops|classes].load_library(<PATH_TO_LIBRARY_FILE>)` is problematic in case `torchaudio` is deployed with PEX format, because the library file does not exist as a file. Our extension module, when it exists, is guaranteed to have PyBind11 binding even when no function is bound. This allows to load the library using the regular `import` statement in Python, and it works even in PEX format. When the library is loaded, the static initialization kicks in and the custom kernels bound via TorchScript also become available. This removes the need to call `torch.[ops|classe].load_library`. This works even when the implementation of custom kernel is stripped from `_torchaudio.so` so long as `_torchaudio.so` properly depend on the library that has the kernel implementations and static initialization.

Simplify the extension initialization process (#1734)
Calling `torch.[ops|classes].load_library(<PATH_TO_LIBRARY_FILE>)` is problematic in case `torchaudio` is deployed with PEX format, because the library file does not exist as a file. Our extension module, when it exists, is guaranteed to have PyBind11 binding even when no function is bound. This allows to load the library using the regular `import` statement in Python, and it works even in PEX format. When the library is loaded, the static initialization kicks in and the custom kernels bound via TorchScript also become available. This removes the need to call `torch.[ops|classe].load_library`. This works even when the implementation of custom kernel is stripped from `_torchaudio.so` so long as `_torchaudio.so` properly depend on the library that has the kernel implementations and static initialization.
e8cc7f91 · moto · GitHub · a525abbc · e8cc7f91 · a525abbc
Unverified Commit e8cc7f91 authored Aug 30, 2021 by moto Committed by GitHub Aug 30, 2021
Showing with 35 additions and 36 deletions

torchaudio/__init__.py torchaudio/__init__.py +35 -1

torchaudio/extension/__init__.py torchaudio/extension/__init__.py +0 -12

torchaudio/extension/extension.py torchaudio/extension/extension.py +0 -23

No files found.
--- a/torchaudio/__init__.py
+++ b/torchaudio/__init__.py
-from . import extension  # noqa: F401
 from torchaudio._internal import module_utils as _mod_utils  # noqa: F401
+
+if _mod_utils.is_module_available('torchaudio._torchaudio'):
+    # Note this import has two purposes
+    # 1. Make _torchaudio accessible by the other modules (regular import)
+    # 2. Register torchaudio's custom ops bound via TorchScript
+    #
+    # For 2, normally function calls `torch.ops.load_library` and `torch.classes.load_library`
+    # are used. However, in our cases, this is inconvenient and unnecessary.
+    #
+    # - Why inconvenient?
+    # When torchaudio is deployed with `pex` format, all the files are deployed as a single zip
+    # file, and the extension module is not present as a file with full path. Therefore it is not
+    # possible to pass the path to library to `torch.[ops|classes].load_library` functions.
+    #
+    # - Why unnecessary?
+    # When torchaudio extension module (C++ module) is available, it is assumed that
+    # the extension contains both TorchScript-based binding and PyBind11-based binding.*
+    # Under this assumption, simply performing `from torchaudio import _torchaudio` will load the
+    # library which contains TorchScript-based binding as well, and the functions/classes bound
+    # via TorchScript become accessible under `torch.ops` and `torch.classes`.
+    #
+    # *Note that this holds true even when these two bindings are split into two library files and
+    # the library that contains PyBind11-based binding (`_torchaudio.so` in the following diagram)
+    # depends on the other one (`libtorchaudio.so`), because when the process tries to load
+    # `_torchaudio.so` it detects undefined symbols from `libtorchaudio.so` and will automatically
+    # loads `libtorchaudio.so`. (given that the library is found in a search path)
+    #
+    # [libtorchaudio.so] <- [_torchaudio.so]
+    #
+    #
+    from torchaudio import _torchaudio  # noqa
+else:
+    import warnings
+    warnings.warn('torchaudio C++ extension is not available.')
+
 from torchaudio import (
    compliance,
    datasets,

--- a/torchaudio/extension/__init__.py
+++ b/torchaudio/extension/__init__.py
-from .extension import (
-    _init_extension,
-)
-
-try:
-    from . import fb  # noqa
-except Exception:
-    pass
-
-_init_extension()
-
-del _init_extension
--- a/torchaudio/extension/extension.py
+++ b/torchaudio/extension/extension.py
-import warnings
-
-import torch
-from torchaudio._internal import module_utils as _mod_utils
-
-
-def _init_extension():
-    if _mod_utils.is_module_available('torchaudio._torchaudio'):
-        # Note this import has two purposes
-        # 1. to extract the path of the extension module so that
-        #    we can initialize the script module with the path.
-        # 2. so that torchaudio._torchaudio is accessible in other modules.
-        #    Look at sox_io_backend which uses `torchaudio._torchaudio.XXX`,
-        #    assuming that the module `_torchaudio` is accessible.
-        import torchaudio._torchaudio
-        _init_script_module(torchaudio._torchaudio.__file__)
-    else:
-        warnings.warn('torchaudio C++ extension is not available.')
-
-
-def _init_script_module(path):
-    torch.classes.load_library(path)
-    torch.ops.load_library(path)