• Jianghai's avatar
    [Inference] Dynamic Batching Inference, online and offline (#4953) · cf579ff4
    Jianghai authored
    
    
    * [inference] Dynamic Batching for Single and Multiple GPUs (#4831)
    
    * finish batch manager
    
    * 1
    
    * first
    
    * fix
    
    * fix dynamic batching
    
    * llama infer
    
    * finish test
    
    * support different lengths generating
    
    * del prints
    
    * del prints
    
    * fix
    
    * fix bug
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    
    * [inference] Async dynamic batching  (#4894)
    
    * finish input and output logic
    
    * add generate
    
    * test forward
    
    * 1
    
    * [inference]Re push async dynamic batching (#4901)
    
    * adapt to ray server
    
    * finish async
    
    * finish test
    
    * del test
    
    ---------
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    
    * Revert "[inference]Re push async dynamic batching (#4901)" (#4905)
    
    This reverts commit fbf3c09e673794ed18c91d4bab1a7dfea052e95a.
    
    * Revert "[inference] Async dynamic batching  (#4894)"
    
    This reverts commit fced14025043e29ce816b315f440601188f7f79f.
    
    * Revert "[inference] Async dynamic batching  (#4894)" (#4909)
    
    This reverts commit fced14025043e29ce816b315f440601188f7f79f.
    
    * Add Ray Distributed Environment Init Scripts
    
    * support DynamicBatchManager base function
    
    * revert _set_tokenizer version
    
    * add driver async generate
    
    * add async test
    
    * fix bugs in test_ray_dist.py
    
    * add get_tokenizer.py
    
    * fix code style
    
    * fix bugs about No module named 'pydantic' in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * [infer]Add Ray Distributed Environment Init Scripts (#4911)
    
    * Revert "[inference] Async dynamic batching  (#4894)"
    
    This reverts commit fced14025043e29ce816b315f440601188f7f79f.
    
    * Add Ray Distributed Environment Init Scripts
    
    * support DynamicBatchManager base function
    
    * revert _set_tokenizer version
    
    * add driver async generate
    
    * add async test
    
    * fix bugs in test_ray_dist.py
    
    * add get_tokenizer.py
    
    * fix code style
    
    * fix bugs about No module named 'pydantic' in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * fix bugs in ci test
    
    * support dynamic batch for bloom model and is_running function
    
    * [Inference]Test for new Async engine (#4935)
    
    * infer engine
    
    * infer engine
    
    * test engine
    
    * test engine
    
    * new manager
    
    * change step
    
    * add
    
    * test
    
    * fix
    
    * fix
    
    * finish test
    
    * finish test
    
    * finish test
    
    * finish test
    
    * add license
    
    ---------
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    
    * add assertion for config (#4947)
    
    * [Inference] Finish dynamic batching offline test (#4948)
    
    * test
    
    * fix test
    
    * fix quant
    
    * add default
    
    * fix
    
    * fix some bugs
    
    * fix some bugs
    
    * fix
    
    * fix bug
    
    * fix bugs
    
    * reset param
    
    ---------
    Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
    Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    cf579ff4
config.yaml 393 Bytes