• Chen Xin's avatar
    Support loading hf model directly (#685) · 6b00f623
    Chen Xin authored
    * turbomind support export model params
    
    * fix overflow
    
    * support turbomind.from_pretrained
    
    * fix tp
    
    * support AutoModel
    
    * support load kv qparams
    
    * update auto_awq
    
    * udpate docstring
    
    * export lmdeploy version
    
    * update doc
    
    * remove download_hf_repo
    
    * LmdeployForCausalLM -> LmdeployForCausalLM
    
    * refactor turbomind.py
    
    * update comment
    
    * add bfloat16 convert back
    
    * support gradio run_locl load hf
    
    * support resuful api server load hf
    
    * add docs
    
    * support loading previous quantized model
    
    * adapt pr 690
    
    * udpate docs
    
    * not export turbomind config when quantize a model
    
    * check model_name when can not get it from config.json
    
    * update readme
    
    * remove model_name in auto_awq
    
    * update
    
    * update
    
    * udpate
    
    * fix build
    
    * absolute import
    6b00f623
LlamaTritonModel.cc 19.7 KB