• Chen Xin's avatar
    Refactor model conversion (#296) · 823ad849
    Chen Xin authored
    * split deploy.py
    
    * fix get_cuda_tensor
    
    * deploy qwen_awq
    
    * fix lint
    
    * add docstring
    
    * fix
    
    * support baichuan/baichuan-awq
    
    * parameterizing size_per_head
    
    * remove try/except
    
    * limit input model_format
    
    * add quant_path param
    
    * remove old deploy.py
    
    * fix path
    
    * fix transformer layer range when load bins
    
    * fix qwen init
    
    * split & save log
    
    * relative import
    
    * update get_config
    
    * WeightFileMgr -> Reader
    
    * rename
    
    * update
    
    * fix init_layer_id
    
    * rename llama.py -> meta_llama.py, hf.py -> llama.py
    
    * reduce code
    
    * update arg description
    
    * fix meta llama
    
    * manually cleanup meta model params
    823ad849
__init__.py 140 Bytes