-
Kunlun Li authored
* Add fp8_primary_weights support for blockwise scaling Signed-off-by:
kunlunl <kunlunl@nvidia.com> custom fsdp Signed-off-by:
kunlunl <kunlunl@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Add view to blockwise fp8 tensor Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix columnwise_shape in backward of view() Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add comments to the unit of start_offset Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add test for view and reshape for blockwise fp8 tensor Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add implementation for self._columnwise_scale_inv is not existed Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Track down checks for _columnwise_data is None and adding checks for _columnwise_invalid Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add assertion to check whether ._quantizer is None Signed-off-by:
kunlunl <kunlunl@nvidia.com> * rename partial_cast.cu -> fp8_block_scaling_partial_cast.cu Signed-off-by:
kunlunl <kunlunl@nvidia.com> * rename partial_cast kernel to fp8_block_scaling_partial_cast kernel Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add shfl_sync in partial cast kernel Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Remove columnwise_invalid flag Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add comments about out-of-bounds write Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
4742c0f8