Explicitly use `python3` and `pip3` executables (#1486)

* Explicitly use python3 and pip3 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Run pre-commit as Python module Signed-off-by: Tim Moon <tmoon@nvidia.com> * Replace some missed references to "python" or "pip" Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Explicitly use `python3` and `pip3` executables (#1486)
* Explicitly use python3 and pip3 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Run pre-commit as Python module Signed-off-by: Tim Moon <tmoon@nvidia.com> * Replace some missed references to "python" or "pip" Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
31f32b37 · Tim Moon · GitHub · 8487e506 · 31f32b37 · 31f32b37
Unverified Commit 31f32b37 authored Mar 12, 2025 by Tim Moon Committed by GitHub Mar 12, 2025
20 changed files
--- a/README.rst
+++ b/README.rst
@@ -173,7 +173,7 @@ To install the latest stable version of Transformer Engine,
 .. code-block:: bash
-    pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+    pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
 This will automatically detect if any supported deep learning frameworks are installed and build
 Transformer Engine support for them. To explicitly specify frameworks, set the environment variable
@@ -184,7 +184,7 @@ Alternatively, the package can be directly installed from
 .. code-block:: bash
-    pip install transformer_engine[pytorch]
+    pip3 install transformer_engine[pytorch]
 To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be
 explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]).

--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -34,7 +34,7 @@ Transformer Engine can be directly installed from `our PyPI <https://pypi.org/pr
 .. code-block:: bash
-    pip install transformer_engine[pytorch]
+    pip3 install transformer_engine[pytorch]
 To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX and PyTorch extensions.
@@ -54,7 +54,7 @@ Execute the following command to install the latest stable version of Transforme
 .. code-block:: bash
-  pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+  pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
 This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`).
@@ -71,7 +71,7 @@ Execute the following command to install the latest development build of Transfo
 .. code-block:: bash
-  pip install git+https://github.com/NVIDIA/TransformerEngine.git@main
+  pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@main
 This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`). To only build the framework-agnostic C++ API, set `NVTE_FRAMEWORK=none`.
@@ -79,7 +79,7 @@ In order to install a specific PR, execute (after changing NNN to the PR number)
 .. code-block:: bash
-  pip install git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
+  pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
 Installation (from source)
@@ -94,7 +94,7 @@ Execute the following commands to install Transformer Engine from source:
  cd TransformerEngine
  export NVTE_FRAMEWORK=pytorch   # Optionally set framework
-  pip install .                   # Build and install
+  pip3 install .                   # Build and install
 If the Git repository has already been cloned, make sure to also clone the submodules:
@@ -106,10 +106,10 @@ Extra dependencies for testing can be installed by setting the "test" option:
 .. code-block:: bash
-  pip install .[test]
+  pip3 install .[test]
 To build the C++ extensions with debug symbols, e.g. with the `-g` flag:
 .. code-block:: bash
-  pip install . --global-option=--debug
+  pip3 install . --global-option=--debug
--- a/qa/L0_cppunittest/test.sh
+++ b/qa/L0_cppunittest/test.sh
@@ -6,7 +6,7 @@ set -e
 # Find TE
 : ${TE_PATH:=/opt/transformerengine}
-TE_LIB_PATH=`pip show transformer-engine | grep Location | cut -d ' ' -f 2`
+TE_LIB_PATH=`pip3 show transformer-engine | grep Location | cut -d ' ' -f 2`
 export LD_LIBRARY_PATH=$TE_LIB_PATH:$LD_LIBRARY_PATH
 cd $TE_PATH/tests/cpp

--- a/qa/L0_jax_distributed_unittest/test.sh
+++ b/qa/L0_jax_distributed_unittest/test.sh
@@ -6,10 +6,10 @@ set -xe
 : ${TE_PATH:=/opt/transformerengine}
-pip install -r $TE_PATH/examples/jax/encoder/requirements.txt
+pip3 install -r $TE_PATH/examples/jax/encoder/requirements.txt
 # Make encoder tests to have run-to-run deterministic to have the stable CI results
 export XLA_FLAGS="${XLA_FLAGS} --xla_gpu_deterministic_ops"
-pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/encoder/test_multigpu_encoder.py
+python3 -m pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/encoder/test_multigpu_encoder.py
-pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/encoder/test_model_parallel_encoder.py
+python3 -m pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/encoder/test_model_parallel_encoder.py
 . $TE_PATH/examples/jax/encoder/run_test_multiprocessing_encoder.sh
--- a/qa/L0_jax_lint/test.sh
+++ b/qa/L0_jax_lint/test.sh
@@ -6,19 +6,19 @@ set -e
 : "${TE_PATH:=/opt/transformerengine}"
-pip install cpplint==1.6.0 pylint==3.3.1
+pip3 install cpplint==1.6.0 pylint==3.3.1
 if [ -z "${PYTHON_ONLY}" ]
 then
  cd $TE_PATH
  echo "Checking common API headers"
-  cpplint --root transformer_engine/common/include --recursive transformer_engine/common/include
+  python3 -m cpplint --root transformer_engine/common/include --recursive transformer_engine/common/include
  echo "Checking C++ files"
-  cpplint --recursive --exclude=transformer_engine/common/include --exclude=transformer_engine/build_tools/build transformer_engine/common
+  python3 -m cpplint --recursive --exclude=transformer_engine/common/include --exclude=transformer_engine/build_tools/build transformer_engine/common
-  cpplint --recursive transformer_engine/jax
+  python3 -m cpplint --recursive transformer_engine/jax
 fi
 if [ -z "${CPP_ONLY}" ]
 then
  cd $TE_PATH
  echo "Checking Python files"
-  pylint --recursive=y transformer_engine/common transformer_engine/jax
+  python3 -m pylint --recursive=y transformer_engine/common transformer_engine/jax
 fi
--- a/qa/L0_jax_unittest/test.sh
+++ b/qa/L0_jax_unittest/test.sh
@@ -4,20 +4,20 @@
 set -xe
-pip install "nltk>=3.8.2"
+pip3 install "nltk>=3.8.2"
-pip install pytest==8.2.1
+pip3 install pytest==8.2.1
 : ${TE_PATH:=/opt/transformerengine}
-pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/tests/jax -k 'not distributed' --ignore=$TE_PATH/tests/jax/test_praxis_layers.py
+python3 -m pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/tests/jax -k 'not distributed' --ignore=$TE_PATH/tests/jax/test_praxis_layers.py
 # Test without custom calls
-NVTE_CUSTOM_CALLS_RE="" pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/tests/jax/test_custom_call_compute.py
+NVTE_CUSTOM_CALLS_RE="" python3 -m pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/tests/jax/test_custom_call_compute.py
-pip install -r $TE_PATH/examples/jax/mnist/requirements.txt
+pip3 install -r $TE_PATH/examples/jax/mnist/requirements.txt
-pip install -r $TE_PATH/examples/jax/encoder/requirements.txt
+pip3 install -r $TE_PATH/examples/jax/encoder/requirements.txt
-pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/mnist
+python3 -m pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/mnist
 # Make encoder tests to have run-to-run deterministic to have the stable CI results
 export XLA_FLAGS="${XLA_FLAGS} --xla_gpu_deterministic_ops"
-pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/encoder/test_single_gpu_encoder.py
+python3 -m pytest -c $TE_PATH/tests/jax/pytest.ini -v $TE_PATH/examples/jax/encoder/test_single_gpu_encoder.py
--- a/qa/L0_jax_wheel/test.sh
+++ b/qa/L0_jax_wheel/test.sh
@@ -6,16 +6,16 @@ set -e
 : "${TE_PATH:=/opt/transformerengine}"
-pip install wheel
+pip3 install wheel
 cd $TE_PATH
-pip uninstall -y transformer-engine transformer-engine-cu12 transformer-engine-jax
+pip3 uninstall -y transformer-engine transformer-engine-cu12 transformer-engine-jax
 VERSION=`cat $TE_PATH/build_tools/VERSION.txt`
 WHL_BASE="transformer_engine-${VERSION}"
 # Core wheel.
-NVTE_RELEASE_BUILD=1 python setup.py bdist_wheel
+NVTE_RELEASE_BUILD=1 python3 setup.py bdist_wheel
 wheel unpack dist/*
 sed -i "s/Name: transformer-engine/Name: transformer-engine-cu12/g" "transformer_engine-${VERSION}/transformer_engine-${VERSION}.dist-info/METADATA"
 sed -i "s/Name: transformer_engine/Name: transformer_engine_cu12/g" "transformer_engine-${VERSION}/transformer_engine-${VERSION}.dist-info/METADATA"
@@ -23,13 +23,13 @@ mv "${WHL_BASE}/${WHL_BASE}.dist-info" "${WHL_BASE}/transformer_engine_cu12-${VE
 wheel pack ${WHL_BASE}
 rm dist/*.whl
 mv *.whl dist/
-NVTE_RELEASE_BUILD=1 NVTE_BUILD_METAPACKAGE=1 python setup.py bdist_wheel
+NVTE_RELEASE_BUILD=1 NVTE_BUILD_METAPACKAGE=1 python3 setup.py bdist_wheel
 cd transformer_engine/jax
-NVTE_RELEASE_BUILD=1 python setup.py sdist
+NVTE_RELEASE_BUILD=1 python3 setup.py sdist
-pip install dist/*
+pip3 install dist/*
 cd $TE_PATH
-pip install dist/*.whl --no-deps
+pip3 install dist/*.whl --no-deps
-python $TE_PATH/tests/jax/test_sanity_import.py
+python3 $TE_PATH/tests/jax/test_sanity_import.py
--- a/qa/L0_license/copyright_checker.py
+++ b/qa/L0_license/copyright_checker.py
-#!/usr/bin/env python
+#!/usr/bin/env python3
 # coding: utf-8
 # Copyright (c) 2022-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
@@ -12,7 +12,7 @@ import json
 import datetime
 if len(sys.argv) < 2:
-    print("Usage: python copyright_checker.py <path>")
+    print("Usage: python3 copyright_checker.py <path>")
 path = sys.argv[1]

--- a/qa/L0_license/test.sh
+++ b/qa/L0_license/test.sh
@@ -6,4 +6,4 @@ set -e
 : "${TE_PATH:=/opt/transformerengine}"
-python $TE_PATH/qa/L0_license/copyright_checker.py $TE_PATH
+python3 $TE_PATH/qa/L0_license/copyright_checker.py $TE_PATH
--- a/qa/L0_pytorch_lint/test.sh
+++ b/qa/L0_pytorch_lint/test.sh
@@ -6,19 +6,19 @@ set -e
 : "${TE_PATH:=/opt/transformerengine}"
-pip install cpplint==1.6.0 pylint==3.3.1
+pip3 install cpplint==1.6.0 pylint==3.3.1
 if [ -z "${PYTHON_ONLY}" ]
 then
  cd $TE_PATH
  echo "Checking common API headers"
-  cpplint --root transformer_engine/common/include --recursive transformer_engine/common/include
+  python3 -m cpplint --root transformer_engine/common/include --recursive transformer_engine/common/include
  echo "Checking C++ files"
-  cpplint --recursive --exclude=transformer_engine/common/include --exclude=transformer_engine/build_tools/build transformer_engine/common
+  python3 -m cpplint --recursive --exclude=transformer_engine/common/include --exclude=transformer_engine/build_tools/build transformer_engine/common
-  cpplint --recursive transformer_engine/pytorch
+  python3 -m cpplint --recursive transformer_engine/pytorch
 fi
 if [ -z "${CPP_ONLY}" ]
 then
  cd $TE_PATH
  echo "Checking Python files"
-  pylint --recursive=y transformer_engine/common transformer_engine/pytorch
+  python3 -m pylint --recursive=y transformer_engine/common transformer_engine/pytorch
 fi
--- a/qa/L0_pytorch_unittest/test.sh
+++ b/qa/L0_pytorch_unittest/test.sh
@@ -6,25 +6,25 @@ set -x
 : ${TE_PATH:=/opt/transformerengine}
-pip install pytest==8.2.1
+pip3 install pytest==8.2.1
 FAIL=0
-pytest -v -s $TE_PATH/tests/pytorch/test_sanity.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_sanity.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_recipe.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_recipe.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_deferred_init.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_deferred_init.py || FAIL=1
-PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 pytest -v -s $TE_PATH/tests/pytorch/test_numerics.py || FAIL=1
+PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_numerics.py || FAIL=1
-NVTE_CUDNN_MXFP8_NORM=0 PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 pytest -v -s $TE_PATH/tests/pytorch/test_cuda_graphs.py || FAIL=1
+NVTE_CUDNN_MXFP8_NORM=0 PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_cuda_graphs.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_jit.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_jit.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_fused_rope.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_fused_rope.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_float8tensor.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_float8tensor.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_gqa.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_gqa.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_fused_optimizer.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_fused_optimizer.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_multi_tensor.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_multi_tensor.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_fusible_ops.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_fusible_ops.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_permutation.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_permutation.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_parallel_cross_entropy.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_parallel_cross_entropy.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/test_cpu_offloading.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/test_cpu_offloading.py || FAIL=1
-NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=1 pytest -o log_cli=true --log-cli-level=INFO -v -s $TE_PATH/tests/pytorch/fused_attn/test_fused_attn.py || FAIL=1
+NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=1 python3 -m pytest -o log_cli=true --log-cli-level=INFO -v -s $TE_PATH/tests/pytorch/fused_attn/test_fused_attn.py || FAIL=1
 exit $FAIL
--- a/qa/L0_pytorch_wheel/test.sh
+++ b/qa/L0_pytorch_wheel/test.sh
@@ -6,16 +6,16 @@ set -e
 : "${TE_PATH:=/opt/transformerengine}"
-pip install wheel
+pip3 install wheel
 cd $TE_PATH
-pip uninstall -y transformer-engine transformer-engine-cu12 transformer-engine-torch
+pip3 uninstall -y transformer-engine transformer-engine-cu12 transformer-engine-torch
 VERSION=`cat $TE_PATH/build_tools/VERSION.txt`
 WHL_BASE="transformer_engine-${VERSION}"
 # Core wheel.
-NVTE_RELEASE_BUILD=1 python setup.py bdist_wheel
+NVTE_RELEASE_BUILD=1 python3 setup.py bdist_wheel
 wheel unpack dist/*
 sed -i "s/Name: transformer-engine/Name: transformer-engine-cu12/g" "transformer_engine-${VERSION}/transformer_engine-${VERSION}.dist-info/METADATA"
 sed -i "s/Name: transformer_engine/Name: transformer_engine_cu12/g" "transformer_engine-${VERSION}/transformer_engine-${VERSION}.dist-info/METADATA"
@@ -23,13 +23,13 @@ mv "${WHL_BASE}/${WHL_BASE}.dist-info" "${WHL_BASE}/transformer_engine_cu12-${VE
 wheel pack ${WHL_BASE}
 rm dist/*.whl
 mv *.whl dist/
-NVTE_RELEASE_BUILD=1 NVTE_BUILD_METAPACKAGE=1 python setup.py bdist_wheel
+NVTE_RELEASE_BUILD=1 NVTE_BUILD_METAPACKAGE=1 python3 setup.py bdist_wheel
 cd transformer_engine/pytorch
-NVTE_RELEASE_BUILD=1 python setup.py sdist
+NVTE_RELEASE_BUILD=1 python3 setup.py sdist
-pip install dist/*
+pip3 install dist/*
 cd $TE_PATH
-pip install dist/*.whl --no-deps
+pip3 install dist/*.whl --no-deps
-python $TE_PATH/tests/pytorch/test_sanity_import.py
+python3 $TE_PATH/tests/pytorch/test_sanity_import.py
--- a/qa/L1_pytorch_distributed_unittest/test.sh
+++ b/qa/L1_pytorch_distributed_unittest/test.sh
@@ -4,15 +4,15 @@
 : ${TE_PATH:=/opt/transformerengine}
-pip install pytest==8.2.1
+pip3 install pytest==8.2.1
 FAIL=0
-pytest -v -s $TE_PATH/tests/pytorch/distributed/test_numerics.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/distributed/test_numerics.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/distributed/test_fusible_ops.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/distributed/test_fusible_ops.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/distributed/test_torch_fsdp2.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/distributed/test_torch_fsdp2.py || FAIL=1
-pytest -v -s $TE_PATH/tests/pytorch/distributed/test_comm_gemm_overlap.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/distributed/test_comm_gemm_overlap.py || FAIL=1
-# pytest -v -s $TE_PATH/tests/pytorch/distributed/test_fusible_ops_with_userbuffers.py  ### TODO Debug UB support with te.Sequential
+# python3 -m pytest -v -s $TE_PATH/tests/pytorch/distributed/test_fusible_ops_with_userbuffers.py || FAIL=1  ### TODO Debug UB support with te.Sequential
-pytest -v -s $TE_PATH/tests/pytorch/fused_attn/test_fused_attn_with_cp.py || FAIL=1
+python3 -m pytest -v -s $TE_PATH/tests/pytorch/fused_attn/test_fused_attn_with_cp.py || FAIL=1
 exit $FAIL
--- a/qa/L1_pytorch_mcore_integration/test.sh
+++ b/qa/L1_pytorch_mcore_integration/test.sh
@@ -40,7 +40,7 @@ CUDA_DEVICE_MAX_CONNECTIONS=1
 NVTE_BIAS_GELU_NVFUSION=0
 NVTE_BIAS_DROPOUT_FUSION=0
-python
+python3
 -m torch.distributed.launch
 --use_env
 --nnodes=1

--- a/qa/L3_pytorch_FA_versions_test/test.sh
+++ b/qa/L3_pytorch_FA_versions_test/test.sh
@@ -6,13 +6,13 @@ set -e
 : ${TE_PATH:=/opt/transformerengine}
-pip install pytest==8.2.1
+pip3 install pytest==8.2.1
 # Limit parallel build jobs to avoid overwhelming system resources
 export MAX_JOBS=4
 # Iterate over Flash Attention versions
-sm_arch=`python -c "import torch; sm = torch.cuda.get_device_capability(0); print(sm[0]*10+sm[1])"`
+sm_arch=`python3 -c "import torch; sm = torch.cuda.get_device_capability(0); print(sm[0]*10+sm[1])"`
 if [ $sm_arch -gt 90 ]
 then
  FA_versions=(2.7.3)
@@ -26,10 +26,10 @@ do
  # Build Flash Attention
  if [ "${fa_version}" \< "3.0.0" ]
  then
-    pip install flash-attn==${fa_version}
+    pip3 install flash-attn==${fa_version}
  else
-    pip install "git+https://github.com/Dao-AILab/flash-attention.git@v2.7.2#egg=flashattn-hopper&subdirectory=hopper"
+    pip3 install "git+https://github.com/Dao-AILab/flash-attention.git@v2.7.2#egg=flashattn-hopper&subdirectory=hopper"
-    python_path=`python -c "import site; print(site.getsitepackages()[0])"`
+    python_path=`python3 -c "import site; print(site.getsitepackages()[0])"`
    mkdir -p $python_path/flashattn_hopper
    wget -P $python_path/flashattn_hopper https://raw.githubusercontent.com/Dao-AILab/flash-attention/v2.7.2/hopper/flash_attn_interface.py
  fi

--- a/qa/format.sh
+++ b/qa/format.sh
@@ -11,5 +11,5 @@ set -e
 cd $TE_PATH
-pip install pre-commit
+pip3 install pre-commit
-pre-commit run --all-files
+python3 -m pre_commit run --all-files
--- a/tests/cpp/CMakeLists.txt
+++ b/tests/cpp/CMakeLists.txt
@@ -26,7 +26,7 @@ enable_testing()
 include_directories(${gtest_SOURCE_DIR}/include ${gtest_SOURCE_DIR})
 if(NOT DEFINED TE_LIB_PATH)
-    execute_process(COMMAND bash -c "pip show transformer-engine | grep Location | cut -d ' ' -f 2 | tr -d '\n'"
+    execute_process(COMMAND bash -c "pip3 show transformer-engine | grep Location | cut -d ' ' -f 2 | tr -d '\n'"
                    OUTPUT_VARIABLE TE_LIB_PATH)
 endif()

--- a/tests/pytorch/distributed/test_comm_gemm_overlap.py
+++ b/tests/pytorch/distributed/test_comm_gemm_overlap.py
@@ -34,7 +34,7 @@ TEST_ROOT = Path(__file__).parent.resolve()
 NUM_PROCS: int = torch.cuda.device_count()
 LAUNCH_CMD = ["torchrun", f"--nproc_per_node={NUM_PROCS}"]
 if tex.ubuf_built_with_mpi():
-    LAUNCH_CMD = ["mpirun", "-np", str(NUM_PROCS), "--oversubscribe", "--quiet", "python"]
+    LAUNCH_CMD = ["mpirun", "-np", str(NUM_PROCS), "--oversubscribe", "--quiet", "python3"]
 # Fall back on CUDA IPC if the platform does not support CUDA multicast
 if not tex.device_supports_multicast():

--- a/tests/pytorch/fused_attn/test_fused_attn_with_cp.py
+++ b/tests/pytorch/fused_attn/test_fused_attn_with_cp.py
@@ -41,7 +41,7 @@ model_configs_flash_attn = {
 def get_bash_arguments(num_gpus_per_node, **kwargs):
    args = [
-        "python",
+        "python3",
        "-m",
        "torch.distributed.launch",
        "--nproc-per-node=" + str(num_gpus_per_node),

--- a/transformer_engine/jax/__init__.py
+++ b/transformer_engine/jax/__init__.py
@@ -35,15 +35,15 @@ def _load_library():
            "TransformerEngine package version mismatch. Found"
            f" {module_name} v{version(module_name)}, transformer-engine"
            f" v{version('transformer-engine')}, and transformer-engine-cu12"
-            f" v{version('transformer-engine-cu12')}. Install transformer-engine using 'pip install"
+            f" v{version('transformer-engine-cu12')}. Install transformer-engine using "
-            " transformer-engine[jax]==VERSION'"
+            "'pip3 install transformer-engine[jax]==VERSION'"
        )
    if is_package_installed("transformer-engine-cu12"):
        if not is_package_installed(module_name):
            _logger.info(
-                "Could not find package %s. Install transformer-engine using 'pip"
+                "Could not find package %s. Install transformer-engine using "
-                " install transformer-engine[jax]==VERSION'",
+                "'pip3 install transformer-engine[jax]==VERSION'",
                module_name,
            )