"...git@developer.sourcefind.cn:kecinstone/2024-pra-vllm.git" did not exist on "4bd18ec0c719d2910040e22fa60503fdbfce1332"
- 14 May, 2025 1 commit
-
-
wenjh authored
Add rules of cuda_runtime.h, cuda_driver.h and cuda_nvml.h to hip. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 27 Mar, 2025 1 commit
-
-
yuguo authored
-
- 20 Mar, 2025 1 commit
-
-
yuguo authored
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 Oct, 2024 1 commit
-
-
Alp Dener authored
* moved userbuffers code to TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * moved comm+GEMM overlap code to TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * removed PyTorch depdency from comm+GEMM overlap in TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * added TE/PyTorch wrappers for refactored comm+GEMM overlap code in TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * updated TE/PyTorch Python API to match the refactored comm+GEMM overlap code Signed-off-by:
Alp Dener <adener@nvidia.com> * updated unit tests to work with refactored comm+GEMM overlap code Signed-off-by:
Alp Dener <adener@nvidia.com> * added a pylint exception to comm+GEMM overlap test runner Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing linting errors Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added documentation for te.initialize_ub Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed compile errors when building with NVTE_UB_WITH_MPI=1 Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed default bootstrap backend Signed-off-by:
Alp Dener <adener@nvidia.com> * switched default bootstrap backend priority to MPI > Gloo > NCCL Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated bootstrap backend documentation Signed-off-by:
Alp Dener <adener@nvidia.com> * close UB bootstrap socket to avoid interfering with CUDA Multicast shareable file handle send/recv Signed-off-by:
Alp Dener <adener@nvidia.com> * added torch::Tensor wrappers for communication buffer and atomic counters so PyTorch can factor externally allocated memory into its garbage collection threshold Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * automated handling of world, local and node ranks/sizes within C++ CommOverlapHelper to simplify Python function signatures Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed incorrect read of environment variables Signed-off-by:
Alp Dener <adener@nvidia.com> * corrected priority for _SOCKET_IFNAME environment variables in UB bootstrapping Signed-off-by:
Alp Dener <adener@nvidia.com> * moved multicast support check to cuda_runtime.h and replaced cudaDeviceGetProp call with cached sm_count() Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed commented out old code and replaced external collective function type defines with aliases Signed-off-by:
Alp Dener <adener@nvidia.com> * compile-time CUDA version guard for CUDA Driver Multicast attribute Signed-off-by:
Alp Dener <adener@nvidia.com> * added compile-time CUDA version guards to Multicast code in Userbuffers Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * condensed UB docs, corrected const violations Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed autodoc rst for UB calls, added CUDA version guard on Multicast UB kernels Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect UB type reporting for P2P overlaps, comment reformatting Signed-off-by:
Alp Dener <adener@nvidia.com> * add docstring to tex.ubuf_built_with_mpi() Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Make transformer_engine::getenv independent of C++ ABI version Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 19 May, 2023 1 commit
-
-
Tim Moon authored
* Initial implementation of NVRTC infrastructure Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial NVRTC impl for transpose NVRTC gives compilation errors at runtime. Everything else compiles and passes tests as expected. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug NVRTC transpose impl NVRTC kernel compiles, runs, and passes tests with FP32. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use variadic template for kernel arguments in RTC kernel launch func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactoring Added utility header for CUDA Runtime API. Optimized concat_strings function. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add helper function for regex substitutions in strings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add option to disable NVRTC support Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for header includes in NVRTC kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Access lazily-initialized CUDA driver lib and add option to specify CUDA header dir Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Configure NVRTC transpose kernel with simple perf model Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Revert change to tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Style fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add prime-valued test cases Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix multiple definition error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Optimize NVRTC transpose kernel for small data sizes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Mention NVRTC in docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit tests for NVRTC and string utils Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add comment in install docs about NVRTC Review suggestion from @nouiz Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug perf model for RTC transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove NVRTC discussion from docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Require CUDA headers unless NVRTC is explicitly disabled Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use diagonal coords in transpose kernel to avoid partition camping Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use std::call_once for thread-safety Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug CMake error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unnecessary call_once Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove diagonal coordinates from transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use size_t indices instead of int Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @ptrendx Check build-time CUDA include path for run-time CUDA headers. Handle case where CUDA context is initially uninitialized. Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-