- Benchmark a single static batch by running the following command without launching a server. The arguments are the same as for `launch_server.py`. Note that this is not a dynamic batching server, so it may run out of memory for a batch size that a real server can handle. A real server truncates the prefill into several batches, while this unit test does not. For accurate large batch testing, consider using `sglang.bench_serving`.
See also [json_decode.py](examples/usage/json_decode.py) for an additional example on specifying formats with Pydantic models.
See also [json_decode.py](examples/usage/json_decode.py) for an additional example of specifying formats with Pydantic models.
#### Batching
Use `run_batch` to run a batch of requests with continuous batching.
...
...
@@ -523,7 +527,6 @@ def chat_example(s):
- The `choices` argument in `sgl.gen` is implemented by computing the [token-length normalized log probabilities](https://blog.eleuther.ai/multiple-choice-normalization/) of all choices and selecting the one with the highest probability.
- The `regex` argument in `sgl.gen` is implemented through autoregressive decoding with logit bias masking, according to the constraints set by the regex. It is compatible with `temperature=0` and `temperature != 0`.