• Xu Kai's avatar
    [inference] Refactor inference architecture (#5057) · fd6482ad
    Xu Kai authored
    
    
    * [inference] support only TP (#4998)
    
    * support only tp
    
    * enable tp
    
    * add support for bloom (#5008)
    
    * [refactor] refactor gptq and smoothquant llama (#5012)
    
    * refactor gptq and smoothquant llama
    
    * fix import error
    
    * fix linear import torch-int
    
    * fix smoothquant llama import error
    
    * fix import accelerate error
    
    * fix bug
    
    * fix import smooth cuda
    
    * fix smoothcuda
    
    * [Inference Refactor] Merge chatglm2 with pp and tp (#5023)
    
    merge chatglm with pp and tp
    
    * [Refactor] remove useless inference code (#5022)
    
    * remove useless code
    
    * fix quant model
    
    * fix test import bug
    
    * mv original inference legacy
    
    * fix chatglm2
    
    * [Refactor] refactor policy search and quant type controlling in inference (#5035)
    
    * [Refactor] refactor policy search and quant type controling in inference
    
    * [inference] update readme (#5051)
    
    * update readme
    
    * update readme
    
    * fix architecture
    
    * fix table
    
    * fix table
    
    * [inference] udpate example (#5053)
    
    * udpate example
    
    * fix run.sh
    
    * fix rebase bug
    
    * fix some errors
    
    * update readme
    
    * add some features
    
    * update interface
    
    * update readme
    
    * update benchmark
    
    * add requirements-infer
    
    ---------
    Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
    Co-authored-by: default avatarZhongkai Zhao <kanezz620@gmail.com>
    fd6482ad
run_benchmark.sh 1.29 KB