- 11 Jun, 2025 1 commit
-
-
Dmitrii Makarenko authored
* [xpu/triton] Add trtion dequantization kernel This PR adds xpu backend and trtion kernel for dequantization nf4 dtype. Trtion is an optional import. Tests: tests/test_functional.py::TestQuantize4BitFunctional supported nf4/fp4 cases tests/test_functional.py::Test8BitBlockwiseQuantizeFunctional implemented quantize_blockwise with binary search that works faster for XPU tests/test_linear4bit.py Signed-off-by:Dmitrii Makarenko <dmitrii.makarenko@intel.com> * align with ipex code * enable test for ipex * test_kbit_backprop: skip no longer needed * remove unused --------- Signed-off-by:
Dmitrii Makarenko <dmitrii.makarenko@intel.com>
-
- 02 Jun, 2025 1 commit
-
-
Matthew Douglas authored
* Tests: xfail opcheck for 4bit quantization with floating storage dtypes * Tests: xfail opcheck for 4bit quantization with floating storage dtypes * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch
-
- 28 May, 2025 1 commit
-
-
jiqing-feng authored
* enable ipex Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu 8bit quantization Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix int8 and nf4 cpu inference Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * add cpu fp4 and rem Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix ipex op Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 name Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 ipex Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bitfp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * enable cpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix quantize blockwise output shape Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix quant_storage bf16 and gemv cpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lib Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip xpu dequantize blockwise op check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bit Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip not used function teests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bit fp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * check ipex before MatMul8bitFp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update ipex install guide Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update install guide Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix error log Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix error lof Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update comment Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * move torch op to default Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * revert ipex check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix code tabledevice Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix code table device Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu ops Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
- 24 May, 2025 1 commit
-
-
Matthew Douglas authored
* General cleanup & test improvements * Tests: WA numpy 2 compat issue for torch<2.3 * Tests: update aarch64 cpu min torch version * Tests: update aarch64 cpu min torch version * Tests: update aarch64 cpu min torch version
-
- 13 May, 2025 1 commit
-
-
Matthew Douglas authored
* Improvements for testing suite * Add workflow for macOS arm64 CPU tests
-
- 28 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Additional 4bit CPU ops * Additional 4bit CPU ops * Implement additional device-agnostic ops and test updates * More test fixes * int8 tests passing * Fix feature flag for multi_backend
-
- 22 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Small bugfix for int8 test * Exclude backward() from code coverage reports * Params4bit: don't try to quantize when moving to meta device
-
- 27 Mar, 2025 1 commit
-
-
Matthew Douglas authored
* Testing cleanup * More test cleanup * Additional deprecations/removals. * Skip benchmark, deprecated, slow tests by default
-
- 25 Mar, 2025 1 commit
-
-
Matthew Douglas authored
* Sketch out first custom op registration * Add note * Initial int8 op registration * Cleanup some deprecated functions. * Int8 ops updates; tests * Implement 4bit quant/dequant ops * Fix nested quant * cleanup * Test improvements * Clean up and improve tests * Add higher level custom op for int8 matmul + dequant + bias * Add gemv 4bit custom op * Cleanup * Implement out kwarg overloads for custom ops * Update PyTorch minimum to 2.1 * Deprecation updates * Deprecation updates * Cleanup; rename int8_linear_dequant -> int8_scaled_mm * Bump min pytorch to 2.2 * cleanup * Test reorganization * Remove deprecated supports_igemmlt * More cleanup * Cleanup obsolete C++/CUDA code * Cleanup * Create 'default' backend for fallback op implementations; initial CPU nf4 work * Stub out for multi-platform * Fix serialization tests for torch>=2.6.0 * Add example for torch.compile e2e inference * Test update --------- Co-authored-by:Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
-