1. 11 Jun, 2025 1 commit
  2. 08 Jun, 2025 1 commit
  3. 06 Jun, 2025 1 commit
  4. 04 Jun, 2025 1 commit
    • Matthew Douglas's avatar
      Deprecation cleanup (#1669) · 849d9449
      Matthew Douglas authored
      * Deprecation cleanup: remove histogram_scatter_add_2d
      
      * Deprecation cleanup: vectorwise_mm_dequant
      
      * Deprecation cleanup: vectorwise_quant
      
      * Remove unused test
      
      * Optimizer test cleanup
      
      * Deprecations: remove estimate_quantiles, create_quantile_map
      
      * Move deprecated test
      849d9449
  5. 03 Jun, 2025 1 commit
  6. 02 Jun, 2025 2 commits
    • Matthew Douglas's avatar
      Add CPU + IPEX to nightly CI (#1667) · 318a86e3
      Matthew Douglas authored
      * Tests: add linux x64 cpu+ipex to nightly CI workflow
      
      * typo
      
      * Tests: guard linear8bit compile test for ipex cpu issue
      318a86e3
    • Matthew Douglas's avatar
      Fix CI regression (#1666) · 945f7c1d
      Matthew Douglas authored
      * Tests: xfail opcheck for 4bit quantization with floating storage dtypes
      
      * Tests: xfail opcheck for 4bit quantization with floating storage dtypes
      
      * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch
      
      * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch
      945f7c1d
  7. 28 May, 2025 1 commit
  8. 24 May, 2025 2 commits
    • Matthew Douglas's avatar
      Add torch.compile tests (#1648) · 9f858294
      Matthew Douglas authored
      * Add torch.compile tests
      
      * Tests: WA aarch64 CPU regressions for torch 2.6.0; add Windows torch==2.7.0+cu118 test config
      
      * Tests: skip torch.compile for cuda on windows
      9f858294
    • Matthew Douglas's avatar
      General cleanup & test improvements (#1646) · 503d243e
      Matthew Douglas authored
      * General cleanup & test improvements
      
      * Tests: WA numpy 2 compat issue for torch<2.3
      
      * Tests: update aarch64 cpu min torch version
      
      * Tests: update aarch64 cpu min torch version
      
      * Tests: update aarch64 cpu min torch version
      503d243e
  9. 21 May, 2025 1 commit
  10. 19 May, 2025 1 commit
    • Matthew Douglas's avatar
      CI runner updates (#1643) · cdcae8d3
      Matthew Douglas authored
      * Test g5g runner
      
      * Switch L4 to L40S runner; swap GitHub Linux T4 runner for AWS g4dn
      
      * Run tests on last 2 pytorch stable releases
      
      * Run tests on last 2 pytorch stable releases
      cdcae8d3
  11. 13 May, 2025 1 commit
  12. 29 Apr, 2025 1 commit
    • Matthew Douglas's avatar
      Set up nightly CI for unit tests (#1619) · a5dd01bb
      Matthew Douglas authored
      * Run unit tests on GH Actions
      
      * fix
      
      * fix
      
      * trigger workflow
      
      * Update
      
      * Update
      
      * Update
      
      * Run tests nightly
      
      * Disable paged optimizer test on Windows
      
      * Skip unit tests on Windows for CUDA 12.x (driver on runner is too old)
      a5dd01bb
  13. 28 Apr, 2025 1 commit
  14. 22 Apr, 2025 1 commit
    • Matthew Douglas's avatar
      Updates for device agnosticism (#1601) · 1088ec52
      Matthew Douglas authored
      * Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit
      
      * Make test suite more device-agnostic
      
      * Additional device agnostic tests
      
      * Additional device agnosticism for tests
      
      * Add BNB_TEST_DEVICE env var to manually select device for unit tests
      
      * Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit
      
      * Make test suite more device-agnostic
      
      * Additional device agnostic tests
      
      * Additional device agnosticism for tests
      
      * Add BNB_TEST_DEVICE env var to manually select device for unit tests
      
      * Small bugfix for int8 test
      
      * Exclude backward() from code coverage reports
      
      * Params4bit: don't try to quantize when moving to meta device
      1088ec52
  15. 27 Mar, 2025 2 commits
  16. 25 Mar, 2025 1 commit
    • Matthew Douglas's avatar
      PyTorch Custom Operator Integration (#1544) · e82f72b3
      Matthew Douglas authored
      
      
      * Sketch out first custom op registration
      
      * Add note
      
      * Initial int8 op registration
      
      * Cleanup some deprecated functions.
      
      * Int8 ops updates; tests
      
      * Implement 4bit quant/dequant ops
      
      * Fix nested quant
      
      * cleanup
      
      * Test improvements
      
      * Clean up and improve tests
      
      * Add higher level custom op for int8 matmul + dequant + bias
      
      * Add gemv 4bit custom op
      
      * Cleanup
      
      * Implement out kwarg overloads for custom ops
      
      * Update PyTorch minimum to 2.1
      
      * Deprecation updates
      
      * Deprecation updates
      
      * Cleanup; rename int8_linear_dequant -> int8_scaled_mm
      
      * Bump min pytorch to 2.2
      
      * cleanup
      
      * Test reorganization
      
      * Remove deprecated supports_igemmlt
      
      * More cleanup
      
      * Cleanup obsolete C++/CUDA code
      
      * Cleanup
      
      * Create 'default' backend for fallback op implementations; initial CPU nf4 work
      
      * Stub out for multi-platform
      
      * Fix serialization tests for torch>=2.6.0
      
      * Add example for torch.compile e2e inference
      
      * Test update
      
      ---------
      Co-authored-by: default avatarTitus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
      e82f72b3
  17. 05 Dec, 2024 1 commit
    • Matthew Douglas's avatar
      LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d
      Matthew Douglas authored
      
      
      * Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation
      
      * Fix unintended change
      
      * New naive mm_dequant kernel for row-major; cleanup
      
      * fix
      
      * int8 refactor: initial sparse decomp, cleanup
      
      * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup
      
      * int8: inference optimizations, some cleanup
      
      * int8: more tests passing, cleanup
      
      * int8 - more cleanup, most tests passing
      
      * int8: specify CUDA stream for int8 ops
      
      * perf: reduce overhead from getting cudaStream ptr
      
      * Mark some functions for deprecation.
      
      * int8 sparse decomp: small perf improvement
      
      * update setup.py
      
      * Update bitsandbytes/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn
      
      * int8 cleanup
      
      * Ignore ruff rule ISC001 (incompatible with formatter)
      
      * add comment
      
      * int8 more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8: rename / deprecate old fn signatures
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * type annotation
      
      * format update
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Add comment to explain division optimization
      
      * more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Type annotations, cleanup
      
      * remove unused kernels; improved type annotations
      
      * small perf optimization for single-GPU systems
      
      * small perf optimization for single-GPU systems
      
      * update docstrings
      
      * Improve docs and tests
      
      * Update docstring
      
      * Update test
      
      * add benchmarking script
      
      * test cleanup: add deprecated marker, move benchmarks out
      
      * Add int8 dequant function; misc improvements
      
      * int8 matmul fallback for inner dims not divisible by 4
      
      * improve register usage of kInt8VectorQuant - especially for A100/H100
      
      * disable fail-fast for package build
      
      * maxwell compat
      
      * ptxas verbose
      
      * docs update
      
      * doc update
      
      * backward fix
      
      * Bugfix sparse decomp
      
      * Int8 fix for PEFT OLoRA init
      
      * Fix test for deprecated spmm_coo
      
      * test improvement
      
      * doc update
      
      * typo
      
      * doc cleanup
      
      * docs
      
      * add inference benchmark script
      
      * Add benchmarks, doc update
      
      ---------
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      81e6345d
  18. 20 Sep, 2024 2 commits
  19. 14 Aug, 2024 1 commit
  20. 06 Aug, 2024 1 commit
  21. 15 Jul, 2024 1 commit
  22. 30 May, 2024 1 commit
    • Benjamin Bossan's avatar
      FIX Make Int8Params deepcopy-able · ed99b3c1
      Benjamin Bossan authored
      This requires to implement the __deepcopy__ method in Int8Params.
      Moreover, there was an issue in the Linear8BitLT constructor that would
      assign instance attributes to the class, which is now fixed.
      
      Please review carefully that this does not impact existing code.
      
      Tests that I ran:
      
      - pytest tests/test_linear8bitlt.py
      - in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py
      - in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py
      - in transformers: RUN_SLOW=1 python -m pytest tests/quantization/bnb -x
      ed99b3c1
  23. 29 May, 2024 1 commit
  24. 02 Apr, 2024 1 commit
  25. 29 Mar, 2024 1 commit
  26. 13 Mar, 2024 2 commits
  27. 11 Mar, 2024 2 commits
  28. 06 Mar, 2024 1 commit
  29. 05 Mar, 2024 1 commit
  30. 21 Feb, 2024 3 commits
  31. 05 Feb, 2024 1 commit
  32. 01 Feb, 2024 1 commit