• Matthew Douglas's avatar
    LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d
    Matthew Douglas authored
    
    
    * Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation
    
    * Fix unintended change
    
    * New naive mm_dequant kernel for row-major; cleanup
    
    * fix
    
    * int8 refactor: initial sparse decomp, cleanup
    
    * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup
    
    * int8: inference optimizations, some cleanup
    
    * int8: more tests passing, cleanup
    
    * int8 - more cleanup, most tests passing
    
    * int8: specify CUDA stream for int8 ops
    
    * perf: reduce overhead from getting cudaStream ptr
    
    * Mark some functions for deprecation.
    
    * int8 sparse decomp: small perf improvement
    
    * update setup.py
    
    * Update bitsandbytes/autograd/_functions.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * Update bitsandbytes/research/autograd/_functions.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn
    
    * int8 cleanup
    
    * Ignore ruff rule ISC001 (incompatible with formatter)
    
    * add comment
    
    * int8 more cleanup
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * int8: rename / deprecate old fn signatures
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * type annotation
    
    * format update
    
    * Update bitsandbytes/research/autograd/_functions.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * cleanup
    
    * Add comment to explain division optimization
    
    * more cleanup
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * Update bitsandbytes/functional.py
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    
    * cleanup
    
    * Type annotations, cleanup
    
    * remove unused kernels; improved type annotations
    
    * small perf optimization for single-GPU systems
    
    * small perf optimization for single-GPU systems
    
    * update docstrings
    
    * Improve docs and tests
    
    * Update docstring
    
    * Update test
    
    * add benchmarking script
    
    * test cleanup: add deprecated marker, move benchmarks out
    
    * Add int8 dequant function; misc improvements
    
    * int8 matmul fallback for inner dims not divisible by 4
    
    * improve register usage of kInt8VectorQuant - especially for A100/H100
    
    * disable fail-fast for package build
    
    * maxwell compat
    
    * ptxas verbose
    
    * docs update
    
    * doc update
    
    * backward fix
    
    * Bugfix sparse decomp
    
    * Int8 fix for PEFT OLoRA init
    
    * Fix test for deprecated spmm_coo
    
    * test improvement
    
    * doc update
    
    * typo
    
    * doc cleanup
    
    * docs
    
    * add inference benchmark script
    
    * Add benchmarks, doc update
    
    ---------
    Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
    81e6345d
training_benchmark.py 6.71 KB