Blockwise scaling linear quantization recipe (#1559)
* Add GEMM logic for blockwise quantized tensors. GEMM test cases included in pytorch integration. Signed-off-by:Keith Wyss <kwyss@nvidia.com> * Update NVTE_BLOCK_SCALING for GEMM. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Gate feature on CUDA 12.9 Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Gemm typo. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove unecessary type converter change. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reflect epilogue availability and test supported epilogues. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * GEMM simplifications from recipe branch. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Format py code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update GEMM DGelu tests to match support depending on output dtype. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Force pow2Scales in GEMM Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add GEMM test to pytorch test suite. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add copyright to GEMM test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update import for GEMM test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add license. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update test gemm supported predicate. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use sgemm like interfaces and naming. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Rewrite GEMM comment. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR Feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Recipe setup for Linear modules. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use 12.9 feature test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Run against tensor dumps from internal library. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update FIXME to TODO with linked issue. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update full recompute feature to save recipe. The recompute context uses the same recipe and fp8 settings as the original fwd pass. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR Feedback. Avoid reusing quantizer objects. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update logic in module. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Format py. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update for PP bug. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update test numerics. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update force_power_of_2 scales in the recipe. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update usage method to satisfy upstream changes. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * fix subchannel recipe in distributed test with bf16 gather Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Edit and cleanup BF16 gather code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update test import. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * support columnwise only mode to 1D quantize kernel Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Format and move enum Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Skip alloc. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * try async bf16 gather Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Format python code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Document and type code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update pytorch lint errors. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Dont set high precision dtype. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add test for sanity and CG; fix CG for sequential? Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Keep make_quantizers API stable Update num_quantizers instead to pass cuda_graph tests. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix import name. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Rename recipe method. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Skip grouped linear sanity test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Set usage before BF16 gather. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * refactor for nvte_quantize_v2 Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Format code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Cleanup nvte_quantize_v2 Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Test fp32 scales. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Disable CUDA graph. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Simplify layernorm linear Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Cleanup layernorm linear. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * LayerNorm linear bwd gather logic. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Communication updates. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update transformer_engine/pytorch/ops/op.py Apply MR comment change. Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
kwyss-nvidia <kwyss@nvidia.com> * Lint fix. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Enable cuda graph tests. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reduce chance of spurious failure and reword. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Review suggestions from @timmoon10 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CPP tests. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update common.h Signed-off-by:
Xin Yao <yaox12@outlook.com> * Update test_float8blockwisetensor.py Signed-off-by:
Xin Yao <yaox12@outlook.com> --------- Signed-off-by:
Keith Wyss <kwyss@nvidia.com> Signed-off-by:
zhongboz <zhongboz@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
kwyss-nvidia <kwyss@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Xin Yao <yaox12@outlook.com> Co-authored-by:
zhongboz <zhongboz@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Xin Yao <yaox12@outlook.com>
Showing
Please register or sign in to comment