vllm/model_executor/models/llama.py · 84e4e37d1427bd254bea6ff366026773d36f3982 · kecinstone / 2024pra-vllm · GitLab

Find file Blame History Permalink

support sharding llama2-70b on more than 8 GPUs (#1209) · a60b3530
Zhuohan Li authored Oct 02, 2023
```
Co-authored-by: JiCheng <247153481@qq.com>
```
a60b3530

llama.py 16.7 KB