"vscode:/vscode.git/clone" did not exist on "c5ae54bfe67f17e497f97f54aed773a2616b778f"
Unverified Commit 159cc741 authored by Lianmin Zheng's avatar Lianmin Zheng Committed by GitHub
Browse files

Make the server random by default (#493)

parent 7d1ebc2d
...@@ -22,10 +22,13 @@ On the other hand, if you see `token usage` very high and you frequently see war ...@@ -22,10 +22,13 @@ On the other hand, if you see `token usage` very high and you frequently see war
### Tune `--dp-size` and `--tp-size` ### Tune `--dp-size` and `--tp-size`
Data parallelism is better for throughput. When there is enough GPU memory, always favor data parallelism for throughput. Data parallelism is better for throughput. When there is enough GPU memory, always favor data parallelism for throughput.
### (Minor) Tune `--max-prefill-tokens`, `--mem-fraction-static`, `--max-running-requests`.
If you see out of memory (OOM) errors, you can decrease these parameters.
If OOM happens during prefill, try to decrease `--max-prefill-tokens`.
If OOM happens during decoding, try to decrease `--max-running-requests`.
You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
### (Minor) Tune `--schedule-heuristic` ### (Minor) Tune `--schedule-heuristic`
If you have many shared prefixes, use the default `--schedule-heuristic lpm`. `lpm` stands for longest prefix match. If you have many shared prefixes, use the default `--schedule-heuristic lpm`. `lpm` stands for longest prefix match.
When you have no shared prefixes at all or you always send the requests with the shared prefixes together, When you have no shared prefixes at all or you always send the requests with the shared prefixes together,
you can try `--schedule-heuristic fcfs`. `fcfs` stands for first come first serve. you can try `--schedule-heuristic fcfs`. `fcfs` stands for first come first serve.
### (Minor) Tune `--max-prefill-tokens`, `--mem-fraction-static`, `--max-running-requests`.
If you see out of memory errors, you can decrease them. Otherwise, the default value should work well.
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
import argparse import argparse
import dataclasses import dataclasses
import random
from typing import List, Optional, Union from typing import List, Optional, Union
...@@ -32,7 +33,7 @@ class ServerArgs: ...@@ -32,7 +33,7 @@ class ServerArgs:
# Other runtime options # Other runtime options
tp_size: int = 1 tp_size: int = 1
stream_interval: int = 8 stream_interval: int = 8
random_seed: int = 42 random_seed: Optional[int] = None
# Logging # Logging
log_level: str = "info" log_level: str = "info"
...@@ -72,6 +73,9 @@ class ServerArgs: ...@@ -72,6 +73,9 @@ class ServerArgs:
elif self.additional_ports is None: elif self.additional_ports is None:
self.additional_ports = [] self.additional_ports = []
if self.random_seed is None:
self.random_seed = random.randint(0, 1 << 30)
@staticmethod @staticmethod
def add_cli_args(parser: argparse.ArgumentParser): def add_cli_args(parser: argparse.ArgumentParser):
parser.add_argument( parser.add_argument(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment