[C][PyTorch] Move multi tensors kernels from PyTorch extensions to core (#1744)
* Move multi tensors kernels from PyTorch extensions to core Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add int16 type to core (for storing fp32 param remainders) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix core build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * same fix to scale Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix perf, memory, vars Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Re-add device guard for multi-device Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix junk output dtype for non-per tensor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes for test and upgrade mcore version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix core tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Showing
Please register or sign in to comment