- 23 Oct, 2025 1 commit
-
-
limm authored
-
- 24 Sep, 2025 1 commit
-
-
pnunna93 authored
* Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py * warpSize is being made non constexpr in ROCm 7.0 * Merge pull request #90 from ROCm/IFU-rocm_enabled-09-23-2025 Ifu rocm enabled 09 23 2025 * Fix typo * unskip test_4bit_quant --------- Co-authored-by:
MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com> Co-authored-by:
MISHANMAUYRA <mishanmaurya31081@gmail.com> Co-authored-by:
amcamd <andrew.chapman@amd.com> Co-authored-by:
Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com> Co-authored-by:
sstamenk <strahinja.stamenkovic@amd.com>
-
- 15 Sep, 2025 1 commit
-
-
Liu Xiaoli authored
* Add SYCL Kernels for XPU backend * fix transpose Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix log and format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * revert cpu changes Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * clean ipex_xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * clean ipex import Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix ipex cpu import Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix comments Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * refine gemv_4bit kernel * enable FP4 for dequant_4bit and gemv_4bit * refine FP4 dequantization performance * remove check for better performance Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix doc Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * clean code * fix tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * rm comments Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix memory issue * fix ut failure * adjust threshold Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * change test_functional check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix test_module Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix device check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * Enable Windows build and refine code * fix xpu log Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * remove ipex entirely Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu int8 CB Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lint Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix logs (#12) * fix logs Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * Fix sycl lint error and tests (#13) * fix sycl nd Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip typo check for xpu kernel codes (#14) * skip test for xpu ops Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lint Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip typo for xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * register triton kernel for quantization (#15) Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * Fix version comparison issue (#18) # Description The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string # Error message ``` The 8-bit optimizer is not available on your device, only available on CUDA for now. 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. Traceback (most recent call last): File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module> import unsloth File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module> from .models import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module> from .llama import FastLlamaModel File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module> from ._utils import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module> from unsloth_zoo.patching_utils import ( File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module> import transformers.integrations.bitsandbytes File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module> import bitsandbytes as bnb File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module> from .backends.xpu import ops as xpu_ops File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module> if version.parse(torch.__version__).release >= version.parse("2.9"): TypeError: '>=' not supported between instances of 'tuple' and 'Version' ``` --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> Co-authored-by:
jiqing-feng <jiqing.feng@intel.com> Co-authored-by:
Er-Xin (Edwin) Shang <shangerxin@hotmail.com>
-
- 20 Jun, 2025 1 commit
-
-
pnunna93 authored
* Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py --------- Co-authored-by:
MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com> Co-authored-by:
MISHANMAUYRA <mishanmaurya31081@gmail.com> Co-authored-by:
amcamd <andrew.chapman@amd.com> Co-authored-by:
Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
-
- 16 Jun, 2025 1 commit
-
-
Chetan Kumar Verma authored
-
- 11 Jun, 2025 1 commit
-
-
Dmitrii Makarenko authored
* [xpu/triton] Add trtion dequantization kernel This PR adds xpu backend and trtion kernel for dequantization nf4 dtype. Trtion is an optional import. Tests: tests/test_functional.py::TestQuantize4BitFunctional supported nf4/fp4 cases tests/test_functional.py::Test8BitBlockwiseQuantizeFunctional implemented quantize_blockwise with binary search that works faster for XPU tests/test_linear4bit.py Signed-off-by:Dmitrii Makarenko <dmitrii.makarenko@intel.com> * align with ipex code * enable test for ipex * test_kbit_backprop: skip no longer needed * remove unused --------- Signed-off-by:
Dmitrii Makarenko <dmitrii.makarenko@intel.com>
-
- 02 Jun, 2025 1 commit
-
-
Matthew Douglas authored
* Tests: xfail opcheck for 4bit quantization with floating storage dtypes * Tests: xfail opcheck for 4bit quantization with floating storage dtypes * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch
-
- 28 May, 2025 1 commit
-
-
jiqing-feng authored
* enable ipex Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu 8bit quantization Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix int8 and nf4 cpu inference Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * add cpu fp4 and rem Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix ipex op Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 name Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 ipex Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bitfp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * enable cpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix quantize blockwise output shape Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix quant_storage bf16 and gemv cpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lib Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip xpu dequantize blockwise op check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bit Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip not used function teests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bit fp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * check ipex before MatMul8bitFp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update ipex install guide Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update install guide Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix error log Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix error lof Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update comment Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * move torch op to default Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * revert ipex check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix code tabledevice Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix code table device Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu ops Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
- 24 May, 2025 1 commit
-
-
Matthew Douglas authored
* General cleanup & test improvements * Tests: WA numpy 2 compat issue for torch<2.3 * Tests: update aarch64 cpu min torch version * Tests: update aarch64 cpu min torch version * Tests: update aarch64 cpu min torch version
-
- 13 May, 2025 1 commit
-
-
Matthew Douglas authored
* Improvements for testing suite * Add workflow for macOS arm64 CPU tests
-
- 28 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Additional 4bit CPU ops * Additional 4bit CPU ops * Implement additional device-agnostic ops and test updates * More test fixes * int8 tests passing * Fix feature flag for multi_backend
-
- 22 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Small bugfix for int8 test * Exclude backward() from code coverage reports * Params4bit: don't try to quantize when moving to meta device
-
- 27 Mar, 2025 1 commit
-
-
Matthew Douglas authored
* Testing cleanup * More test cleanup * Additional deprecations/removals. * Skip benchmark, deprecated, slow tests by default
-
- 25 Mar, 2025 1 commit
-
-
Matthew Douglas authored
* Sketch out first custom op registration * Add note * Initial int8 op registration * Cleanup some deprecated functions. * Int8 ops updates; tests * Implement 4bit quant/dequant ops * Fix nested quant * cleanup * Test improvements * Clean up and improve tests * Add higher level custom op for int8 matmul + dequant + bias * Add gemv 4bit custom op * Cleanup * Implement out kwarg overloads for custom ops * Update PyTorch minimum to 2.1 * Deprecation updates * Deprecation updates * Cleanup; rename int8_linear_dequant -> int8_scaled_mm * Bump min pytorch to 2.2 * cleanup * Test reorganization * Remove deprecated supports_igemmlt * More cleanup * Cleanup obsolete C++/CUDA code * Cleanup * Create 'default' backend for fallback op implementations; initial CPU nf4 work * Stub out for multi-platform * Fix serialization tests for torch>=2.6.0 * Add example for torch.compile e2e inference * Test update --------- Co-authored-by:Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
-