We tried using vLLM 0.5.3.post1, but it often crashes under high loads, so we are using the older version, vLLM 0.5.2.
Preparation for TensorRT LLM can refer to https://github.com/sgl-project/tensorrt-demo. Specifically, we used a batch size of 512, a max input length of 8192, and a max number of tokens of 8192. The instance count for preprocessing and postprocessing in Triton Server is 16.