"git@developer.sourcefind.cn:kecinstone/2024-pra-vllm.git" did not exist on "8c4b2592fb953d1a8f880d42ebb1b28eaa94d0a6"
Unverified Commit f612b749 authored by satias10's avatar satias10 Committed by GitHub
Browse files

docs: Document NVTE_CUDA_ARCHS environment variable in README (#2414)



Add:: NVTE_CUDA_ARCHS to README
Signed-off-by: default avatarShoval Atias <satias@satias-mlt.client.nvidia.com>
Co-authored-by: default avatarShoval Atias <satias@satias-mlt.client.nvidia.com>
parent 0056b981
...@@ -259,6 +259,7 @@ These environment variables can be set before installation to customize the buil ...@@ -259,6 +259,7 @@ These environment variables can be set before installation to customize the buil
* **NVTE_FRAMEWORK**: Comma-separated list of frameworks to build for (e.g., ``pytorch,jax``) * **NVTE_FRAMEWORK**: Comma-separated list of frameworks to build for (e.g., ``pytorch,jax``)
* **MAX_JOBS**: Limit number of parallel build jobs (default varies by system) * **MAX_JOBS**: Limit number of parallel build jobs (default varies by system)
* **NVTE_BUILD_THREADS_PER_JOB**: Control threads per build job * **NVTE_BUILD_THREADS_PER_JOB**: Control threads per build job
* **NVTE_CUDA_ARCHS**: Semicolon-separated list of CUDA compute architectures to compile for (e.g., ``80;90`` for A100 and H100). If not set, automatically determined based on CUDA version. Setting this can significantly reduce build time and binary size.
Compiling with FlashAttention Compiling with FlashAttention
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment