`min` determines the lowest value of the bucket. `step` determines the interval between buckets, and `max` determines the upper bound of the bucket. Furthermore, interval between `min` and `step` has special handling -- `min` gets multiplied by consecutive powers of two, until `step` gets reached. We call this the ramp-up phase and it is used for handling lower batch sizes with minimum wastage, while allowing larger padding on larger batch sizes.
| `min` | Determines the lowest value of the bucket. |
| `step` | Determines the interval between buckets. |
| `max` | Determines the upper bound of the bucket. |
| Ramp-up phase | A special handling phase applied between `min` and `step`:<br/>- `min` is multiplied by consecutive powers of two until `step` is reached.<br/>- Minimizes resource wastage for small batch sizes.<br/>- Allows larger padding for larger batches. |
Example (with ramp-up):
...
...
@@ -349,28 +354,28 @@ Each described step is logged by vLLM server, as follows (negative values corres
-`VLLM_{phase}_{dim}_BUCKET_{param}` - collection of 12 environment variables configuring ranges of bucketing mechanism
*`{phase}` is either `PROMPT` or `DECODE`
* `{phase}` is either `PROMPT` or `DECODE`
*`{dim}` is either `BS`, `SEQ` or `BLOCK`
* `{dim}` is either `BS`, `SEQ` or `BLOCK`
*`{param}` is either `MIN`, `STEP` or `MAX`
* `{param}` is either `MIN`, `STEP` or `MAX`
* Default values:
* Default values:
- Prompt:
- batch size min (`VLLM_PROMPT_BS_BUCKET_MIN`): `1`