1. 06 May, 2024 1 commit
  2. 02 May, 2024 1 commit
    • mobicham's avatar
      Add HQQ quantization support (#29637) · 59952994
      mobicham authored
      
      
      * update HQQ transformers integration
      
      * push import_utils.py
      
      * add force_hooks check in modeling_utils.py
      
      * fix | with Optional
      
      * force bias as param
      
      * check bias is Tensor
      
      * force forward for multi-gpu
      
      * review fixes pass
      
      * remove torch grad()
      
      * if any key in linear_tags fix
      
      * add cpu/disk check
      
      * isinstance return
      
      * add multigpu test + refactor tests
      
      * clean hqq_utils imports in hqq.py
      
      * clean hqq_utils imports in quantizer_hqq.py
      
      * delete hqq_utils.py
      
      * Delete src/transformers/utils/hqq_utils.py
      
      * ruff init
      
      * remove torch.float16 from __init__ in test
      
      * refactor test
      
      * isinstance -> type in quantizer_hqq.py
      
      * cpu/disk device_map check in quantizer_hqq.py
      
      * remove type(module) nn.linear check in quantizer_hqq.py
      
      * add BaseQuantizeConfig import inside HqqConfig init
      
      * remove hqq import in hqq.py
      
      * remove accelerate import from test_hqq.py
      
      * quant config.py doc update
      
      * add hqqconfig to main_classes doc
      
      * make style
      
      * __init__ fix
      
      * ruff __init__
      
      * skip_modules list
      
      * hqqconfig format fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * test_hqq.py remove mistral comment
      
      * remove self.using_multi_gpu is False
      
      * torch_dtype default val set and logger.info
      
      * hqq.py isinstance fix
      
      * remove torch=None
      
      * torch_device test_hqq
      
      * rename test_hqq
      
      * MODEL_ID in test_hqq
      
      * quantizer_hqq setattr fix
      
      * quantizer_hqq typo fix
      
      * imports quantizer_hqq.py
      
      * isinstance quantizer_hqq
      
      * hqq_layer.bias reformat quantizer_hqq
      
      * Step 2 as comment in quantizer_hqq
      
      * prepare_for_hqq_linear() comment
      
      * keep_in_fp32_modules fix
      
      * HqqHfQuantizer reformat
      
      * quantization.md hqqconfig
      
      * quantization.md model example reformat
      
      * quantization.md # space
      
      * quantization.md space   })
      
      * quantization.md space   })
      
      * quantization_config fix doc
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * axis value check in quantization_config
      
      * format
      
      * dynamic config explanation
      
      * quant config method in quantization.md
      
      * remove shard-level progress
      
      * .cuda fix modeling_utils
      
      * test_hqq fixes
      
      * make fix-copies
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      59952994