"The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py). \n",
"The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). \n",
"\n",
"\n",
"Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases."
"Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases."
@@ -17,7 +17,7 @@ The `/generate` endpoint accepts the following parameters in JSON format. For in
...
@@ -17,7 +17,7 @@ The `/generate` endpoint accepts the following parameters in JSON format. For in
*`stream: bool = False` Whether to stream the output.
*`stream: bool = False` Whether to stream the output.
*`lora_path: Optional[Union[List[Optional[str]], Optional[str]]] = None` Path to LoRA weights.
*`lora_path: Optional[Union[List[Optional[str]], Optional[str]]] = None` Path to LoRA weights.
*`custom_logit_processor: Optional[Union[List[Optional[str]], str]] = None` Custom logit processor for advanced sampling control. For usage see below.
*`custom_logit_processor: Optional[Union[List[Optional[str]], str]] = None` Custom logit processor for advanced sampling control. For usage see below.
*`return_hidden_states: bool = False` Whether to return hidden states of the model. Note that each time it changes, the cuda graph will be recaptured, which might lead to a performance hit. See the [examples](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py) for more information.
*`return_hidden_states: bool = False` Whether to return hidden states of the model. Note that each time it changes, the cuda graph will be recaptured, which might lead to a performance hit. See the [examples](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states) for more information.