- 13 Jun, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Added ReLU and GLU variants to common Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * pyTorch changes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * PyTorch C++ lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix storage errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Compute bgrad Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix numerical tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX export tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2023 1 commit
-
-
Tim Moon authored
* Initial implementation of NVRTC infrastructure Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial NVRTC impl for transpose NVRTC gives compilation errors at runtime. Everything else compiles and passes tests as expected. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug NVRTC transpose impl NVRTC kernel compiles, runs, and passes tests with FP32. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use variadic template for kernel arguments in RTC kernel launch func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactoring Added utility header for CUDA Runtime API. Optimized concat_strings function. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add helper function for regex substitutions in strings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add option to disable NVRTC support Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for header includes in NVRTC kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Access lazily-initialized CUDA driver lib and add option to specify CUDA header dir Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Configure NVRTC transpose kernel with simple perf model Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Revert change to tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Style fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add prime-valued test cases Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix multiple definition error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Optimize NVRTC transpose kernel for small data sizes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Mention NVRTC in docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit tests for NVRTC and string utils Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add comment in install docs about NVRTC Review suggestion from @nouiz Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug perf model for RTC transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove NVRTC discussion from docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Require CUDA headers unless NVRTC is explicitly disabled Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use diagonal coords in transpose kernel to avoid partition camping Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use std::call_once for thread-safety Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug CMake error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unnecessary call_once Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove diagonal coordinates from transpose kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use size_t indices instead of int Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @ptrendx Check build-time CUDA include path for run-time CUDA headers. Handle case where CUDA context is initially uninitialized. Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 17 Jan, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Move scale inverse calculation to framework Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix RMSNorm Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix gated kernel/geglu Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Add NVTX to TE modules Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix pylint Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix NVTX in _prepare_backward Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add NVTX to C API Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix cpplint and link nvToolsExt Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add NVTX to GeGlu Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 10 Jan, 2023 1 commit
-
-
zlsh80826 authored
* Add GeGLU and DGeGLU Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add DGeGLUCT Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update copyright year Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine shape check Signed-off-by:
Reese Wang <rewang@nvidia.com> * Code refine Signed-off-by:
Reese Wang <rewang@nvidia.com> Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 08 Dec, 2022 1 commit
-
-
Przemyslaw Tredak authored
* Move the amax/scale/scale_inv into the TE Tensor struct. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Handle multi_cast_transpose Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Changed softmax to new Tensor Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass at the cpp tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Round of fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix multi_cast_transpose Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix cast_to_fp8 Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-