- 27 Mar, 2025 1 commit
-
-
Matthew Douglas authored
-
- 25 Mar, 2025 3 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
* Sketch out first custom op registration * Add note * Initial int8 op registration * Cleanup some deprecated functions. * Int8 ops updates; tests * Implement 4bit quant/dequant ops * Fix nested quant * cleanup * Test improvements * Clean up and improve tests * Add higher level custom op for int8 matmul + dequant + bias * Add gemv 4bit custom op * Cleanup * Implement out kwarg overloads for custom ops * Update PyTorch minimum to 2.1 * Deprecation updates * Deprecation updates * Cleanup; rename int8_linear_dequant -> int8_scaled_mm * Bump min pytorch to 2.2 * cleanup * Test reorganization * Remove deprecated supports_igemmlt * More cleanup * Cleanup obsolete C++/CUDA code * Cleanup * Create 'default' backend for fallback op implementations; initial CPU nf4 work * Stub out for multi-platform * Fix serialization tests for torch>=2.6.0 * Add example for torch.compile e2e inference * Test update --------- Co-authored-by:Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
-
Matthew Douglas authored
-
- 19 Mar, 2025 1 commit
-
-
Titus authored
-
- 13 Mar, 2025 1 commit
-
-
Titus authored
-
- 07 Mar, 2025 1 commit
-
-
Ethan Kiang authored
-
- 25 Feb, 2025 2 commits
-
-
Matthew Douglas authored
-
dependabot[bot] authored
Bumps the minor-patch group with 1 update in the / directory: [ruff](https://github.com/astral-sh/ruff). Updates `ruff` from 0.6.9 to 0.9.6 - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/ruff/compare/0.6.9...0.9.6 ) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 24 Feb, 2025 4 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
- 20 Feb, 2025 1 commit
-
-
Titus authored
-
- 19 Feb, 2025 4 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Fx Morin authored
-
Mitchell Goff authored
-
- 06 Feb, 2025 6 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
Matthew Douglas authored
-
- 28 Jan, 2025 2 commits
-
-
Johnny authored
* blackwell * blackwell * Update python-package.yml
-
Matthew Douglas authored
-
- 23 Jan, 2025 3 commits
-
-
Matthew Douglas authored
-
Aarni Koskela authored
* Exclude tests from distribution Fixes #1478 * Update pyproject.toml * Update pyproject.toml --------- Co-authored-by:Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
-
Matthew Douglas authored
-
- 22 Jan, 2025 1 commit
-
-
Johnny authored
* initial support blackwell * Update CHANGELOG.md Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update CMakeLists.txt * Update CHANGELOG.md * fix build-cuda.sh * fix build-cuda.sh * fix cuda 12.7 build-cuda.sh * Update build-cuda.sh * Update cuda from 12.6.2 to 12.6.3 * Update .github/workflows/python-package.yml Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update install_cuda.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update install_cuda.sh Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update .github/scripts/build-cuda.sh * Update install_cuda.sh --------- Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
-
- 14 Jan, 2025 3 commits
-
-
Benjamin Badger authored
Co-authored-by:Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
-
Matthew Douglas authored
-
Matthew Douglas authored
* (chore) Remove unused dotfiles * cleanup: remove unused kernels/C++ code
-
- 17 Dec, 2024 2 commits
-
-
Saurav Maheshkar authored
* chore: move configs to pyproject.toml * fix: drop file from CI workflow * feat: reorder pytest markers * chore: retain comments * chore(build): migrate build data to pyproject Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Apply suggestions from code review Co-authored-by:
Aarni Koskela <akx@iki.fi> * chore: move configs to pyproject.toml * Apply suggestions from code review Co-authored-by:
Aarni Koskela <akx@iki.fi> * bump ruff --------- Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by:
Aarni Koskela <akx@iki.fi>
-
Bert Maher authored
* Remove triton.ops, copy necessary bits here Summary: Triton upstream removed `triton.ops` and moved it to a semi-unmaintained `kernels` repo. Since all that's needed here is the perf model, just add those bits here. * Add source reference/license comment
-
- 11 Dec, 2024 1 commit
-
-
Matthew Douglas authored
-
- 10 Dec, 2024 2 commits
-
-
Matthew Douglas authored
-
Huazhong Ji authored
-
- 05 Dec, 2024 2 commits
-
-
Matthew Douglas authored
-
Matthew Douglas authored
* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation * Fix unintended change * New naive mm_dequant kernel for row-major; cleanup * fix * int8 refactor: initial sparse decomp, cleanup * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup * int8: inference optimizations, some cleanup * int8: more tests passing, cleanup * int8 - more cleanup, most tests passing * int8: specify CUDA stream for int8 ops * perf: reduce overhead from getting cudaStream ptr * Mark some functions for deprecation. * int8 sparse decomp: small perf improvement * update setup.py * Update bitsandbytes/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn * int8 cleanup * Ignore ruff rule ISC001 (incompatible with formatter) * add comment * int8 more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8: rename / deprecate old fn signatures * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * type annotation * format update * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Add comment to explain division optimization * more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Type annotations, cleanup * remove unused kernels; improved type annotations * small perf optimization for single-GPU systems * small perf optimization for single-GPU systems * update docstrings * Improve docs and tests * Update docstring * Update test * add benchmarking script * test cleanup: add deprecated marker, move benchmarks out * Add int8 dequant function; misc improvements * int8 matmul fallback for inner dims not divisible by 4 * improve register usage of kInt8VectorQuant - especially for A100/H100 * disable fail-fast for package build * maxwell compat * ptxas verbose * docs update * doc update * backward fix * Bugfix sparse decomp * Int8 fix for PEFT OLoRA init * Fix test for deprecated spmm_coo * test improvement * doc update * typo * doc cleanup * docs * add inference benchmark script * Add benchmarks, doc update --------- Co-authored-by:
Aarni Koskela <akx@iki.fi>
-