Unverified Commit 37bbfc76 authored by Tim Moon's avatar Tim Moon Committed by GitHub
Browse files

Refactor build system (#235)



* Refactor Setuptools build system

Successfully launches CMake install, but installs CMake extensions in temp dir.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Debug JAX build

Fix pybind11 import. Distinguish between build-time and run-time dependencies.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add helper function to determine dependencies
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add missing license
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Debug case where system CMake is too old
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add missing license
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Simplify sanity import tests

Just importing modules provides richer error messages.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Properly install submodules
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Install helper library for TensorFlow
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Update documentation
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Do not install Ninja by default
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Include Git commit hash in version string
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Override build_ext.build_extensions instead of build_ext.run
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Fix incorrect include path

Restore Ninja dependency. Restore overriding build_ext.run func.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Review suggestions from @nouiz
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Disable parallel Ninja jobs in GitHub actions
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Properly install userbuffers lib
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Tweak install docs

Review suggestion from @ksivaman
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add examples for specifying framework in docs
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

---------
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>
parent 215dfe7e
......@@ -22,7 +22,7 @@ jobs:
- name: 'Build'
run: |
mkdir -p wheelhouse && \
NVTE_FRAMEWORK=pytorch pip wheel -w wheelhouse . -v
NVTE_FRAMEWORK=pytorch MAX_JOBS=1 pip wheel -w wheelhouse . -v
- name: 'Upload wheel'
uses: actions/upload-artifact@v3
with:
......@@ -47,7 +47,6 @@ jobs:
submodules: recursive
- name: 'Build'
run: |
pip install ninja pybind11 && \
pip install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html && \
mkdir -p wheelhouse && \
NVTE_FRAMEWORK=jax pip wheel -w wheelhouse . -v
......@@ -74,7 +73,6 @@ jobs:
submodules: recursive
- name: 'Build'
run: |
pip install ninja pybind11 && \
mkdir -p wheelhouse && \
NVTE_FRAMEWORK=tensorflow pip wheel -w wheelhouse . -v
- name: 'Upload wheel'
......
......@@ -34,12 +34,9 @@ pip - from GitHub
Additional Prerequisites
^^^^^^^^^^^^^^^^^^^^^^^^
1. `CMake <https://cmake.org/>`__ version 3.18 or later: `pip install cmake`.
2. [For pyTorch support] `pyTorch <https://pytorch.org/>`__ with GPU support.
3. [For JAX support] `JAX <https://github.com/google/jax/>`__ with GPU support, version >= 0.4.7.
4. [For TensorFlow support] `TensorFlow <https://www.tensorflow.org/>`__ with GPU support.
5. `pybind11`: `pip install pybind11`.
6. [Optional] `Ninja <https://ninja-build.org/>`__: `pip install ninja`.
1. [For PyTorch support] `PyTorch <https://pytorch.org/>`__ with GPU support.
2. [For JAX support] `JAX <https://github.com/google/jax/>`__ with GPU support, version >= 0.4.7.
3. [For TensorFlow support] `TensorFlow <https://www.tensorflow.org/>`__ with GPU support.
Installation (stable release)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -48,11 +45,9 @@ Execute the following command to install the latest stable version of Transforme
.. code-block:: bash
# Execute one of the following commands
NVTE_FRAMEWORK=pytorch pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable # Build TE for PyTorch only. The default.
NVTE_FRAMEWORK=jax pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable # Build TE for JAX only.
NVTE_FRAMEWORK=tensorflow pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable # Build TE for TensorFlow only.
NVTE_FRAMEWORK=all pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable # Build TE for all supported frameworks.
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,tensorflow`).
Installation (development build)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -67,12 +62,10 @@ Execute the following command to install the latest development build of Transfo
.. code-block:: bash
# Execute one of the following commands
NVTE_FRAMEWORK=pytorch pip install git+https://github.com/NVIDIA/TransformerEngine.git@main # Build TE for PyTorch only. The default.
NVTE_FRAMEWORK=jax pip install git+https://github.com/NVIDIA/TransformerEngine.git@main # Build TE for JAX only.
NVTE_FRAMEWORK=tensorflow pip install git+https://github.com/NVIDIA/TransformerEngine.git@main # Build TE for TensorFlow only.
NVTE_FRAMEWORK=all pip install git+https://github.com/NVIDIA/TransformerEngine.git@main # Build TE for all supported frameworks.
pip install git+https://github.com/NVIDIA/TransformerEngine.git@main
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,tensorflow`).
Installation (from source)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -80,14 +73,27 @@ Execute the following commands to install Transformer Engine from source:
.. code-block:: bash
git clone --recursive https://github.com/NVIDIA/TransformerEngine.git # Clone the repository/fork and checkout all submodules recursively.
cd TransformerEngine # Enter TE directory.
git checkout stable # Checkout the correct branch.
export NVTE_FRAMEWORK=pytorch # Optionally set the framework.
pip install . # Build and install
# Clone repository, checkout stable branch, clone submodules
git clone --branch stable --recursive https://github.com/NVIDIA/TransformerEngine.git
cd TransformerEngine
export NVTE_FRAMEWORK=pytorch # Optionally set framework
pip install . # Build and install
If the Git repository has already been cloned, make sure to also clone the submodules:
.. code-block:: bash
git submodule update --init --recursive
Extra dependencies for testing can be installed by setting the "test" option:
.. code-block:: bash
pip install .[test]
For already cloned repos, run the following command in TE directory:
To build the C++ extensions with debug symbols, e.g. with the `-g` flag:
.. code-block:: bash
git submodule update --init --recursive # Checkout all submodules recursively.
pip install . --global-option=--debug
This diff is collapsed.
......@@ -2,11 +2,5 @@
#
# See LICENSE for license information.
try:
import transformer_engine.jax
te_imported = True
except:
te_imported = False
assert te_imported, 'transformer_engine import failed'
import transformer_engine.jax
print("OK")
......@@ -2,11 +2,5 @@
#
# See LICENSE for license information.
try:
import transformer_engine.pytorch
te_imported = True
except:
te_imported = False
assert te_imported, 'transformer_engine import failed'
import transformer_engine.pytorch
print("OK")
......@@ -2,11 +2,5 @@
#
# See LICENSE for license information.
try:
import transformer_engine.tensorflow
te_imported = True
except:
te_imported = False
assert te_imported, 'transformer_engine import failed'
import transformer_engine.tensorflow
print("OK")
......@@ -28,16 +28,20 @@ include_directories(${PROJECT_SOURCE_DIR})
add_subdirectory(common)
if(NVTE_WITH_USERBUFFERS)
message(STATUS "userbuffers support enabled")
add_subdirectory(pytorch/csrc/userbuffers)
endif()
option(ENABLE_JAX "Enable JAX in the building workflow." OFF)
message(STATUS "JAX support: ${ENABLE_JAX}")
if(ENABLE_JAX)
find_package(pybind11 CONFIG REQUIRED)
add_subdirectory(jax)
endif()
option(ENABLE_TENSORFLOW "Enable TensorFlow in the building workflow." OFF)
message(STATUS "TensorFlow support: ${ENABLE_TENSORFLOW}")
if(ENABLE_TENSORFLOW)
find_package(pybind11 CONFIG REQUIRED)
add_subdirectory(tensorflow)
......
# Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# See LICENSE for license information.
add_library(CUDNN::cudnn_all INTERFACE IMPORTED)
find_path(
......@@ -14,7 +18,7 @@ function(find_cudnn_library NAME)
HINTS $ENV{CUDNN_PATH} ${CUDNN_PATH} ${CUDAToolkit_LIBRARY_DIR}
PATH_SUFFIXES lib64 lib/x64 lib
)
if(${UPPERCASE_NAME}_LIBRARY)
add_library(CUDNN::${NAME} UNKNOWN IMPORTED)
set_target_properties(
......@@ -48,7 +52,7 @@ if(CUDNN_INCLUDE_DIR AND CUDNN_LIBRARY)
message(STATUS "cuDNN: ${CUDNN_LIBRARY}")
message(STATUS "cuDNN: ${CUDNN_INCLUDE_DIR}")
set(CUDNN_FOUND ON CACHE INTERNAL "cuDNN Library Found")
else()
......@@ -73,6 +77,5 @@ target_link_libraries(
CUDNN::cudnn_adv_infer
CUDNN::cudnn_cnn_infer
CUDNN::cudnn_ops_infer
CUDNN::cudnn
CUDNN::cudnn
)
......@@ -77,3 +77,6 @@ set_source_files_properties(fused_softmax/scaled_masked_softmax.cu
COMPILE_OPTIONS "--use_fast_math")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -O3")
# Install library
install(TARGETS transformer_engine DESTINATION .)
......@@ -10,3 +10,4 @@ pybind11_add_module(
)
target_link_libraries(transformer_engine_jax PRIVATE CUDA::cudart CUDA::cublas CUDA::cublasLt transformer_engine)
install(TARGETS transformer_engine_jax DESTINATION .)
......@@ -31,3 +31,6 @@ set_source_files_properties(userbuffers.cu
COMPILE_OPTIONS "$<$<COMPILE_LANGUAGE:CUDA>:-maxrregcount=64>")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -O3")
# Install library
install(TARGETS transformer_engine_userbuffers DESTINATION .)
......@@ -13,9 +13,9 @@ add_library(
)
# Includes
execute_process(COMMAND ${Python_EXECUTABLE} -c "import tensorflow as tf; print(tf.sysconfig.get_include())"
execute_process(COMMAND ${Python_EXECUTABLE} -c "import tensorflow as tf; print(tf.sysconfig.get_include())"
OUTPUT_VARIABLE Tensorflow_INCLUDE_DIRS OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND ${Python_EXECUTABLE} -c "import numpy as np; print(np.get_include())"
execute_process(COMMAND ${Python_EXECUTABLE} -c "import numpy as np; print(np.get_include())"
OUTPUT_VARIABLE Numpy_INCLUDE_DIRS OUTPUT_STRIP_TRAILING_WHITESPACE)
target_include_directories(transformer_engine_tensorflow PRIVATE
......@@ -25,7 +25,7 @@ target_include_directories(transformer_engine_tensorflow PRIVATE
target_include_directories(_get_stream PRIVATE ${Tensorflow_INCLUDE_DIRS})
# Libraries
execute_process(COMMAND ${Python_EXECUTABLE} -c "import tensorflow as tf; print(tf.__file__)"
execute_process(COMMAND ${Python_EXECUTABLE} -c "import tensorflow as tf; print(tf.__file__)"
OUTPUT_VARIABLE Tensorflow_LIB_PATH OUTPUT_STRIP_TRAILING_WHITESPACE)
get_filename_component(Tensorflow_LIB_PATH ${Tensorflow_LIB_PATH} DIRECTORY)
list(APPEND TF_LINKER_LIBS "${Tensorflow_LIB_PATH}/libtensorflow_framework.so.2")
......@@ -40,3 +40,7 @@ target_link_libraries(_get_stream PRIVATE ${TF_LINKER_LIBS})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -O3")
# Install library
install(TARGETS transformer_engine_tensorflow DESTINATION .)
install(TARGETS _get_stream DESTINATION .)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment