Commit 4538bdae authored by wangkuigang-yewu-cmss's avatar wangkuigang-yewu-cmss
Browse files

prevent rpc process from crashing on long prompt

当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
parent 797dac7e
......@@ -374,6 +374,10 @@ class BalanceServeInterface(BackendInterfaceBase):
top_p = 0.0001
query_add.sample_options.top_p = top_p
query_add.estimated_length = min(self.args.cache_lens, query_length+self.args.max_new_tokens)
if query_add.estimated_length < query_add.query_length:
raise Exception(f'query too long: estimated_length={query_add.estimated_length} < query_length={query_add.query_length}')
query_id = self.sched_client.add_query(query_add)
queue = asyncio.Queue(maxsize=self.args.max_new_tokens)
self.queue_map[query_id] = queue
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment