1. 20 Nov, 2023 1 commit
  2. 19 Nov, 2023 1 commit
    • Xu Kai's avatar
      [inference] Refactor inference architecture (#5057) · fd6482ad
      Xu Kai authored
      
      
      * [inference] support only TP (#4998)
      
      * support only tp
      
      * enable tp
      
      * add support for bloom (#5008)
      
      * [refactor] refactor gptq and smoothquant llama (#5012)
      
      * refactor gptq and smoothquant llama
      
      * fix import error
      
      * fix linear import torch-int
      
      * fix smoothquant llama import error
      
      * fix import accelerate error
      
      * fix bug
      
      * fix import smooth cuda
      
      * fix smoothcuda
      
      * [Inference Refactor] Merge chatglm2 with pp and tp (#5023)
      
      merge chatglm with pp and tp
      
      * [Refactor] remove useless inference code (#5022)
      
      * remove useless code
      
      * fix quant model
      
      * fix test import bug
      
      * mv original inference legacy
      
      * fix chatglm2
      
      * [Refactor] refactor policy search and quant type controlling in inference (#5035)
      
      * [Refactor] refactor policy search and quant type controling in inference
      
      * [inference] update readme (#5051)
      
      * update readme
      
      * update readme
      
      * fix architecture
      
      * fix table
      
      * fix table
      
      * [inference] udpate example (#5053)
      
      * udpate example
      
      * fix run.sh
      
      * fix rebase bug
      
      * fix some errors
      
      * update readme
      
      * add some features
      
      * update interface
      
      * update readme
      
      * update benchmark
      
      * add requirements-infer
      
      ---------
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarZhongkai Zhao <kanezz620@gmail.com>
      fd6482ad
  3. 27 Oct, 2023 1 commit
    • Bin Jia's avatar
      [Pipeline inference] Combine kvcache with pipeline inference (#4938) · 1db67276
      Bin Jia authored
      * merge kvcache with pipeline inference and refactor the code structure
      
      * support ppsize > 2
      
      * refactor pipeline code
      
      * do pre-commit
      
      * modify benchmark
      
      * fix bench mark
      
      * polish code
      
      * add docstring and update readme
      
      * refactor the code
      
      * fix some logic bug of ppinfer
      
      * polish readme
      
      * fix typo
      
      * skip infer test
      1db67276
  4. 18 Oct, 2023 1 commit
  5. 11 Oct, 2023 1 commit
    • Bin Jia's avatar
      [Pipeline Inference] Sync pipeline inference branch to main (#4820) · 08a9f76b
      Bin Jia authored
      * [pipeline inference] pipeline inference (#4492)
      
      * add pp stage manager as circle stage
      
      * fix a bug when create process group
      
      * add ppinfer basic framework
      
      * add micro batch manager and support kvcache-pp gpt2 fwd
      
      * add generate schedule
      
      * use mb size to control mb number
      
      * support generate with kv cache
      
      * add output, remove unused code
      
      * add test
      
      * reuse shardformer to build model
      
      * refactor some code and use the same attribute name of hf
      
      * fix review and add test for generation
      
      * remove unused file
      
      * fix CI
      
      * add cache clear
      
      * fix code error
      
      * fix typo
      
      * [Pipeline inference] Modify to tieweight (#4599)
      
      * add pp stage manager as circle stage
      
      * fix a bug when create process group
      
      * add ppinfer basic framework
      
      * add micro batch manager and support kvcache-pp gpt2 fwd
      
      * add generate schedule
      
      * use mb size to control mb number
      
      * support generate with kv cache
      
      * add output, remove unused code
      
      * add test
      
      * reuse shardformer to build model
      
      * refactor some code and use the same attribute name of hf
      
      * fix review and add test for generation
      
      * remove unused file
      
      * modify the way of saving newtokens
      
      * modify to tieweight
      
      * modify test
      
      * remove unused file
      
      * solve review
      
      * add docstring
      
      * [Pipeline inference] support llama pipeline inference (#4647)
      
      * support llama pipeline inference
      
      * remove tie weight operation
      
      * [pipeline inference] Fix the blocking of communication when ppsize is 2 (#4708)
      
      * add benchmark verbose
      
      * fix export tokens
      
      * fix benchmark verbose
      
      * add P2POp style to do p2p communication
      
      * modify schedule as p2p type when ppsize is 2
      
      * remove unused code and add docstring
      
      * [Pipeline inference] Refactor code, add docsting, fix bug (#4790)
      
      * add benchmark script
      
      * update argparse
      
      * fix fp16 load
      
      * refactor code style
      
      * add docstring
      
      * polish code
      
      * fix test bug
      
      * [Pipeline inference] Add pipeline inference docs (#4817)
      
      * add readme doc
      
      * add a ico
      
      * Add performance
      
      * update table of contents
      
      * refactor code (#4873)
      08a9f76b