• Bin Jia's avatar
    [Pipeline Inference] Sync pipeline inference branch to main (#4820) · 08a9f76b
    Bin Jia authored
    * [pipeline inference] pipeline inference (#4492)
    
    * add pp stage manager as circle stage
    
    * fix a bug when create process group
    
    * add ppinfer basic framework
    
    * add micro batch manager and support kvcache-pp gpt2 fwd
    
    * add generate schedule
    
    * use mb size to control mb number
    
    * support generate with kv cache
    
    * add output, remove unused code
    
    * add test
    
    * reuse shardformer to build model
    
    * refactor some code and use the same attribute name of hf
    
    * fix review and add test for generation
    
    * remove unused file
    
    * fix CI
    
    * add cache clear
    
    * fix code error
    
    * fix typo
    
    * [Pipeline inference] Modify to tieweight (#4599)
    
    * add pp stage manager as circle stage
    
    * fix a bug when create process group
    
    * add ppinfer basic framework
    
    * add micro batch manager and support kvcache-pp gpt2 fwd
    
    * add generate schedule
    
    * use mb size to control mb number
    
    * support generate with kv cache
    
    * add output, remove unused code
    
    * add test
    
    * reuse shardformer to build model
    
    * refactor some code and use the same attribute name of hf
    
    * fix review and add test for generation
    
    * remove unused file
    
    * modify the way of saving newtokens
    
    * modify to tieweight
    
    * modify test
    
    * remove unused file
    
    * solve review
    
    * add docstring
    
    * [Pipeline inference] support llama pipeline inference (#4647)
    
    * support llama pipeline inference
    
    * remove tie weight operation
    
    * [pipeline inference] Fix the blocking of communication when ppsize is 2 (#4708)
    
    * add benchmark verbose
    
    * fix export tokens
    
    * fix benchmark verbose
    
    * add P2POp style to do p2p communication
    
    * modify schedule as p2p type when ppsize is 2
    
    * remove unused code and add docstring
    
    * [Pipeline inference] Refactor code, add docsting, fix bug (#4790)
    
    * add benchmark script
    
    * update argparse
    
    * fix fp16 load
    
    * refactor code style
    
    * add docstring
    
    * polish code
    
    * fix test bug
    
    * [Pipeline inference] Add pipeline inference docs (#4817)
    
    * add readme doc
    
    * add a ico
    
    * Add performance
    
    * update table of contents
    
    * refactor code (#4873)
    08a9f76b
gpt2.py 14.3 KB