• mobicham's avatar
    Add HQQ quantization support (#29637) · 59952994
    mobicham authored
    
    
    * update HQQ transformers integration
    
    * push import_utils.py
    
    * add force_hooks check in modeling_utils.py
    
    * fix | with Optional
    
    * force bias as param
    
    * check bias is Tensor
    
    * force forward for multi-gpu
    
    * review fixes pass
    
    * remove torch grad()
    
    * if any key in linear_tags fix
    
    * add cpu/disk check
    
    * isinstance return
    
    * add multigpu test + refactor tests
    
    * clean hqq_utils imports in hqq.py
    
    * clean hqq_utils imports in quantizer_hqq.py
    
    * delete hqq_utils.py
    
    * Delete src/transformers/utils/hqq_utils.py
    
    * ruff init
    
    * remove torch.float16 from __init__ in test
    
    * refactor test
    
    * isinstance -> type in quantizer_hqq.py
    
    * cpu/disk device_map check in quantizer_hqq.py
    
    * remove type(module) nn.linear check in quantizer_hqq.py
    
    * add BaseQuantizeConfig import inside HqqConfig init
    
    * remove hqq import in hqq.py
    
    * remove accelerate import from test_hqq.py
    
    * quant config.py doc update
    
    * add hqqconfig to main_classes doc
    
    * make style
    
    * __init__ fix
    
    * ruff __init__
    
    * skip_modules list
    
    * hqqconfig format fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * hqqconfig doc fix
    
    * test_hqq.py remove mistral comment
    
    * remove self.using_multi_gpu is False
    
    * torch_dtype default val set and logger.info
    
    * hqq.py isinstance fix
    
    * remove torch=None
    
    * torch_device test_hqq
    
    * rename test_hqq
    
    * MODEL_ID in test_hqq
    
    * quantizer_hqq setattr fix
    
    * quantizer_hqq typo fix
    
    * imports quantizer_hqq.py
    
    * isinstance quantizer_hqq
    
    * hqq_layer.bias reformat quantizer_hqq
    
    * Step 2 as comment in quantizer_hqq
    
    * prepare_for_hqq_linear() comment
    
    * keep_in_fp32_modules fix
    
    * HqqHfQuantizer reformat
    
    * quantization.md hqqconfig
    
    * quantization.md model example reformat
    
    * quantization.md # space
    
    * quantization.md space   })
    
    * quantization.md space   })
    
    * quantization_config fix doc
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * axis value check in quantization_config
    
    * format
    
    * dynamic config explanation
    
    * quant config method in quantization.md
    
    * remove shard-level progress
    
    * .cuda fix modeling_utils
    
    * test_hqq fixes
    
    * make fix-copies
    
    ---------
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    59952994
quantization.md 38.5 KB