Support Runtime tensor parallelism (#158)
* works on interlm and vicuna * support GQA * remove comment * update readme, add logger, default tp=1 * remove log
Showing
lmdeploy/turbomind/utils.py
0 → 100644
Please register or sign in to comment