- 10 Jul, 2024 1 commit
-
-
Tim Moon authored
Reduce CUDA driver API calls when choosing transpose kernels Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Apr, 2024 1 commit
-
-
Tim Moon authored
* Add NVRTC kernels for cast-transpose Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update copyright year Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add noop flag to NVRTC cast-transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 12 Apr, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* FP8 cuda graphs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> * Fix numerics Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * exclude torch compile from numerics tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More numerics fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm fusion from unfused path Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 19 May, 2023 1 commit
-
-
Tim Moon authored
* Initial implementation of NVRTC infrastructure Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial NVRTC impl for transpose NVRTC gives compilation errors at runtime. Everything else compiles and passes tests as expected. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug NVRTC transpose impl NVRTC kernel compiles, runs, and passes tests with FP32. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use variadic template for kernel arguments in RTC kernel launch func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactoring Added utility header for CUDA Runtime API. Optimized concat_strings function. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add helper function for regex substitutions in strings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add option to disable NVRTC support Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for header includes in NVRTC kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Access lazily-initialized CUDA driver lib and add option to specify CUDA header dir Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Configure NVRTC transpose kernel with simple perf model Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Revert change to tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Style fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add prime-valued test cases Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix multiple definition error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Optimize NVRTC transpose kernel for small data sizes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Mention NVRTC in docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit tests for NVRTC and string utils Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add comment in install docs about NVRTC Review suggestion from @nouiz Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug perf model for RTC transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove NVRTC discussion from docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Require CUDA headers unless NVRTC is explicitly disabled Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use diagonal coords in transpose kernel to avoid partition camping Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use std::call_once for thread-safety Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug CMake error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unnecessary call_once Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove diagonal coordinates from transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use size_t indices instead of int Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @ptrendx Check build-time CUDA include path for run-time CUDA headers. Handle case where CUDA context is initially uninitialized. Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 12 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Add NVTX to TE modules Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix pylint Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix NVTX in _prepare_backward Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add NVTX to C API Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix cpplint and link nvToolsExt Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add NVTX to GeGlu Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 08 Dec, 2022 1 commit
-
-
Przemyslaw Tredak authored
* Move the amax/scale/scale_inv into the TE Tensor struct. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Handle multi_cast_transpose Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Changed softmax to new Tensor Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass at the cpp tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Round of fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix multi_cast_transpose Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix cast_to_fp8 Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-