- 30 Sep, 2025 1 commit
-
-
Matthew Douglas authored
* Update installation docs * Update links * Fix cuda min glibc in doc * Update header levels * Update AMD section * typo
-
- 29 Sep, 2025 2 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
-
- 26 Sep, 2025 2 commits
-
-
Jun Jiang authored
* Update build-cuda.sh --------- Co-authored-by:Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
-
Matthew Douglas authored
* Update workflow for packaging * Run the workflow when files change in `.github/scripts/**` * Shorter names for build jobs * Change Windows CUDA install back to "network" using subpackages * Update * Modify sub-packages for CUDA version handling Updated sub-packages to conditionally include 'crt' for CUDA 13. * Update CUDA sub-packages in workflow configuration * Change CUDA install method to 'local' for version 13 on Windows * Modify CUDA sub-packages for version 13 support * Change CUDA install method to 'network' in workflow * CUDA build script: only install security updates in container * CUDA build script: only install security updates in container * Pin macos build runner to macos-15 and windows to windows-2025 * ROCm build: remove unneeded build step
-
- 25 Sep, 2025 1 commit
-
-
Matthew Douglas authored
* Intel XPU: build and package binary for Linux * Update artifact name
-
- 24 Sep, 2025 1 commit
-
-
pnunna93 authored
* Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py * warpSize is being made non constexpr in ROCm 7.0 * Merge pull request #90 from ROCm/IFU-rocm_enabled-09-23-2025 Ifu rocm enabled 09 23 2025 * Fix typo * unskip test_4bit_quant --------- Co-authored-by:
MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com> Co-authored-by:
MISHANMAUYRA <mishanmaurya31081@gmail.com> Co-authored-by:
amcamd <andrew.chapman@amd.com> Co-authored-by:
Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com> Co-authored-by:
sstamenk <strahinja.stamenkovic@amd.com>
-
- 23 Sep, 2025 1 commit
-
-
Matthew Douglas authored
* CUDA 13 build enablement * Try to fix Windows build workflow * Add torch 2.9+cu130 to tests * Fix python version * Update test workflow * Don't test CPU on torch 2.9 yet * Update doc
-
- 22 Sep, 2025 1 commit
-
-
Matthew Douglas authored
-
- 19 Sep, 2025 2 commits
-
-
Vivek Goel authored
* Add function to reverse 4bit weights for HPU * Fix lint error
-
YangKai0616 authored
-
- 18 Sep, 2025 1 commit
-
-
Mohamed Hisham authored
* Added branchless LUT-based dequantization for FP4 and NF4 * Added extra command line options to control reproducibility * Restore FP4 quantization/dequantization order
-
- 16 Sep, 2025 3 commits
-
-
Matthew Douglas authored
* Bump minimum PyTorch to 2.3 * Tests: Fix Windows numpy<2 compatibility for torch<2.4.1
-
Matthew Douglas authored
-
Egor authored
* implemented 8bit optimizers * Add interface * Commented out torch checks * Merged * Updated kernels * Reused code for quant/dequant * Removed empty line * Changed Readme
-
- 15 Sep, 2025 4 commits
-
-
Matthew Douglas authored
-
Liu Xiaoli authored
* Add SYCL Kernels for XPU backend * fix transpose Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix log and format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * revert cpu changes Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * clean ipex_xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * clean ipex import Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix ipex cpu import Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix comments Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * refine gemv_4bit kernel * enable FP4 for dequant_4bit and gemv_4bit * refine FP4 dequantization performance * remove check for better performance Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix doc Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * clean code * fix tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * rm comments Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix memory issue * fix ut failure * adjust threshold Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * change test_functional check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix test_module Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix device check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * Enable Windows build and refine code * fix xpu log Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * remove ipex entirely Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu int8 CB Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lint Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix logs (#12) * fix logs Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * Fix sycl lint error and tests (#13) * fix sycl nd Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip typo check for xpu kernel codes (#14) * skip test for xpu ops Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lint Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip typo for xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * register triton kernel for quantization (#15) Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * Fix version comparison issue (#18) # Description The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string # Error message ``` The 8-bit optimizer is not available on your device, only available on CUDA for now. 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. Traceback (most recent call last): File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module> import unsloth File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module> from .models import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module> from .llama import FastLlamaModel File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module> from ._utils import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module> from unsloth_zoo.patching_utils import ( File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module> import transformers.integrations.bitsandbytes File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module> import bitsandbytes as bnb File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module> from .backends.xpu import ops as xpu_ops File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module> if version.parse(torch.__version__).release >= version.parse("2.9"): TypeError: '>=' not supported between instances of 'tuple' and 'Version' ``` --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> Co-authored-by:
jiqing-feng <jiqing.feng@intel.com> Co-authored-by:
Er-Xin (Edwin) Shang <shangerxin@hotmail.com>
-
YangKai0616 authored
* Implemented 32bit optimizers in triton * Modify Comments * Optimizing pure torch implementation * Restore the order of parameters and modify the position of pure pytorch implementation * Restore files permissions --------- Co-authored-by:Fanli Lin <fanli.lin@intel.com>
-
Matthew Douglas authored
-
- 09 Sep, 2025 1 commit
-
-
Matthew Douglas authored
* Test suite improvements for MPS/XPU/HPU * Skip test on torch==2.8.0+cpu for Windows regression
-
- 08 Sep, 2025 2 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
* Add parametrize util for targeting parameters outside of nn.Linear modules * Parametrize 4bit: replace existing prequantized weight * cleanup * Add caching for parametrization * Add tests * Fix tests * Guard for torch < 2.5 * Guard for torch < 2.5 * Another test gaurd for torch >= 2.5
-
- 03 Sep, 2025 2 commits
-
-
kaixuanliu authored
* for intel xpu case, use MatMul8bitFp even not use ipex Signed-off-by:
Liu, Kaixuan <kaixuan.liu@intel.com> * fix lint issue Signed-off-by:
Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by:
Liu, Kaixuan <kaixuan.liu@intel.com>
-
jiqing-feng authored
* add int mm for xpu after torch 2.9 Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * add packaging on pyproject Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
- 02 Sep, 2025 1 commit
-
-
Yuanyuan Chen authored
* Fix unused variable warnings and other ruff warnings Signed-off-by:
cyy <cyyever@outlook.com> * Fix format Signed-off-by:
cyy <cyyever@outlook.com> --------- Signed-off-by:
cyy <cyyever@outlook.com>
-
- 25 Aug, 2025 1 commit
-
-
Yuanyuan Chen authored
Signed-off-by:cyy <cyyever@outlook.com>
-
- 11 Aug, 2025 4 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
- 06 Aug, 2025 2 commits
-
-
Matthew Douglas authored
[CUDA] Fixing quantization uint8 packing bug for NF4 and FP4
-
Matthew Douglas authored
Fix Params4bit tensor subclass handling
-
- 04 Aug, 2025 1 commit
-
-
ved1beta authored
-
- 02 Aug, 2025 2 commits
-
-
ved1beta authored
-
Mohamed Hisham authored
-
- 31 Jul, 2025 1 commit
-
-
ved1beta authored
-
- 21 Jul, 2025 4 commits
-
-
Matthew Douglas authored
Add Volta support in cu128/cu129 builds
-
Matthew Douglas authored
-
Matthew Douglas authored
Create FUNDING.yml
-
Matthew Douglas authored
-