@@ -38,6 +38,8 @@ Encoder engines should be launched with the following flags:
...
@@ -38,6 +38,8 @@ Encoder engines should be launched with the following flags:
-`--max-num-batched-tokens=<large value>`**(default: 2048)** – This flag controls the token scheduling budget per decoding step and is irrelevant to encoder-only instances. **Set it to a very high value (effectively unlimited) to bypass scheduler limitations.** The actual token budget is managed by the encoder cache manager.
-`--max-num-batched-tokens=<large value>`**(default: 2048)** – This flag controls the token scheduling budget per decoding step and is irrelevant to encoder-only instances. **Set it to a very high value (effectively unlimited) to bypass scheduler limitations.** The actual token budget is managed by the encoder cache manager.
-`--convert "mm_encoder_only"`**(Optional)** - The language model is skipped during initialization to reduce device memory usage. **Models using this option must implement the `get_language_model_spec` interface.**
## Local media inputs
## Local media inputs
To support local image inputs (from your ```MEDIA_PATH``` directory), add the following flag to the encoder instance:
To support local image inputs (from your ```MEDIA_PATH``` directory), add the following flag to the encoder instance: